Access count of this item: 290

Files in This Item:
File Description SizeFormat 
ICBIR57571.2023.10147628.pdf827.76 kBAdobe PDFView/Open
Title: Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai
Authors: Yasuoka, Koichi  kyouindb  KAKEN_id
Author's alias: 安岡, 孝一
Keywords: dependency-parsing
part-of-speech tagging
sequence-labeling
Universal Dependencies
pre-trained language model
Issue Date: May-2023
Publisher: IEEE
Journal title: 2023 8th International Conference on Business and Industrial Research (ICBIR)
Start page: 169
End page: 173
Abstract: The author and his colleagues have been developing classical Chinese treebank using Universal Dependencies. We also developed RoBERTa-Classical-Chinese model pre-trained with classical Chinese texts of 1.7 billion characters. In this paper we describe how to finetune sequence-labeling RoBERTa model for dependency-parsing in classical Chinese. We introduce “goeswith”-labeled edges into the directed acyclic graphs of Universal Dependencies in order to resolve the mismatch between the token length of RoBERTa-Classical-Chinese and the word length in classical Chinese. We utilize [MASK]token of RoBERTa model to handle outgoing edges and to produce the adjacency-matrices for the graphs of Universal Dependencies. Our RoBERTa-UDgoeswith model outperforms other dependency-parsers in classical Chinese on LAS/MLAS/BLEX benchmark scores. Then we apply our methods to other isolating languages. For Vietnamese we introduce “goeswith”-labeled edges to separate words into space-separated syllables, and finetune RoBERTa and PhoBERT models. For Thai we try three kinds of tokenizers, character-wise tokenizer, quasi-syllable tokenizer, and SentencePiece, to produce RoBERTa models.
Description: 2023 8th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand. 18-19 May 2023
Rights: © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
This is not the published version. Please cite only the published version. この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。
URI: http://hdl.handle.net/2433/284021
DOI(Published Version): 10.1109/ICBIR57571.2023.10147628
Related Link: https://icbir.tni.ac.th/
Appears in Collections:Journal Articles

Show full item record

Export to RefWorks


Export Format: 


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.