Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai

Yasuoka, Koichi

このアイテムのアクセス数: 286

http://hdl.handle.net/2433/284021

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
ICBIR57571.2023.10147628.pdf		827.76 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Yasuoka, Koichi	en
dc.contributor.alternative	安岡, 孝一	ja
dc.date.accessioned	2023-07-06T02:56:14Z	-
dc.date.available	2023-07-06T02:56:14Z	-
dc.date.issued	2023-05	-
dc.identifier.isbn	9798350399646	-
dc.identifier.uri	http://hdl.handle.net/2433/284021	-
dc.description	2023 8th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand. 18-19 May 2023	en
dc.description.abstract	The author and his colleagues have been developing classical Chinese treebank using Universal Dependencies. We also developed RoBERTa-Classical-Chinese model pre-trained with classical Chinese texts of 1.7 billion characters. In this paper we describe how to finetune sequence-labeling RoBERTa model for dependency-parsing in classical Chinese. We introduce “goeswith”-labeled edges into the directed acyclic graphs of Universal Dependencies in order to resolve the mismatch between the token length of RoBERTa-Classical-Chinese and the word length in classical Chinese. We utilize [MASK]token of RoBERTa model to handle outgoing edges and to produce the adjacency-matrices for the graphs of Universal Dependencies. Our RoBERTa-UDgoeswith model outperforms other dependency-parsers in classical Chinese on LAS/MLAS/BLEX benchmark scores. Then we apply our methods to other isolating languages. For Vietnamese we introduce “goeswith”-labeled edges to separate words into space-separated syllables, and finetune RoBERTa and PhoBERT models. For Thai we try three kinds of tokenizers, character-wise tokenizer, quasi-syllable tokenizer, and SentencePiece, to produce RoBERTa models.	en
dc.language.iso	eng	-
dc.publisher	IEEE	en
dc.rights	© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en
dc.rights	This is not the published version. Please cite only the published version. この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。	en
dc.subject	dependency-parsing	en
dc.subject	part-of-speech tagging	en
dc.subject	sequence-labeling	en
dc.subject	Universal Dependencies	en
dc.subject	pre-trained language model	en
dc.title	Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai	en
dc.type	conference paper	-
dc.type.niitype	Conference Paper	-
dc.identifier.jtitle	2023 8th International Conference on Business and Industrial Research (ICBIR)	en
dc.identifier.spage	169	-
dc.identifier.epage	173	-
dc.relation.doi	10.1109/ICBIR57571.2023.10147628	-
dc.textversion	author	-
dc.address	Institute for Research in Humanities, Kyoto University	en
dc.relation.url	https://icbir.tni.ac.th/	-
dcterms.accessRights	open access	-
jpcoar.conferenceName	International Conference on Business and Industrial Research (ICBIR)	en
jpcoar.conferenceSequence	8	-
jpcoar.conferenceSponsor	Thai-Nichi Institute of Technology (TNI) , Technology Promotion Association (Thailand-Japan) (TPA) and Artificial Intelligence Association of Thailand (AIAT)	en
jpcoar.conferenceDate	May 18-19, 2023	en
jpcoar.conferenceStartDate	2023-05-18	-
jpcoar.conferenceEndDate	2023-05-19	-
jpcoar.conferenceVenue	E Building, Thai-Nichi Institute of Technology	en
jpcoar.conferencePlace	Bangkok	en
jpcoar.conferenceCountry	THA	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。