Automated labeling of PDF mathematical exercises with word N-grams VSM classification

Yamauchi, Taisei; Flanagan, Brendan; Nakamoto, Ryosuke; Dai, Yiling; Takami, Kyosuke; Ogata, Hiroaki

このアイテムのアクセス数: 147

http://hdl.handle.net/2433/286449

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
s40561-023-00271-9.pdf		4.07 MB	Adobe PDF	見る/開く

タイトル:	Automated labeling of PDF mathematical exercises with word N-grams VSM classification
著者:	Yamauchi, Taisei Flanagan, Brendan https://orcid.org/0000-0001-7644-997X (unconfirmed) Nakamoto, Ryosuke Dai, Yiling https://orcid.org/0000-0001-9900-8763 (unconfirmed) Takami, Kyosuke Ogata, Hiroaki https://orcid.org/0000-0001-5216-1576 (unconfirmed)
著者名の別形:	山内, 大聖中本, 陵介戴, 憶菱緒方, 広明
キーワード:	Automatic labeling Word n-gram Random forest Incomplete text classification Word embedding Mathematical education Mathematical education in Japan
発行日:	18-Oct-2023
出版者:	Springer Nature
誌名:	Smart Learning Environments
巻:	10
論文番号:	51
抄録:	In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.
著作権等:	© The Author(s) 2023 This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
URI:	http://hdl.handle.net/2433/286449
DOI(出版社版):	10.1186/s40561-023-00271-9
出現コレクション:	学術雑誌掲載論文等

アイテムの詳細レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス