タイトル: SCTB-V2: the 2nd version of the Chinese treebank in the scientific domain
著者: Chu, Chenhui  kyouindb  KAKEN_id  orcid https://orcid.org/0000-0001-9848-6384 (unconfirmed)
Mao, Zhuoyuan
Nakazawa, Toshiaki
Kawahara, Daisuke
Kurohashi, Sadao
著者名の別形: 褚, 晨翚
毛, 卓遠
黒橋, 禎夫
キーワード: Treebank
Scientific domain
発行日: Sep-2023
出版者: Springer Nature
誌名: Language Resources and Evaluation
巻: 57
号: 3
開始ページ: 1389
終了ページ: 1403
抄録: Word segmentation, part-of-speech (POS) tagging, and syntactic parsing are three fundamental Chinese analysis tasks for Chinese language processing, which are also crucial for various downstream tasks such as machine translation and information extraction. To achieve high accuracy for these tasks, treebanks that contain sentences manually annotated with word segmentation, part-of-speech tags, and phrase structures are essential. Although there are large-scale Chinese treebanks in the news domain, such treebanks are unavailable in the scientific domain. This significantly limits the performance of Chinese language processing for scientific text. To address this problem, we annotate the 2nd version of the Chinese treebank in the scientific domain (SCTB-V2). SCTB-V2 contains 12, 175 sentences annotated with word segmentation, part-of-speech tags, and phrase structures. We conducted Chinese analyses and machine translation experiments on SCTB-V2. The results show the effectiveness of SCTB-V2. We release this treebank to promote scientific Chinese language processing research http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?A%20Chinese%20Treebank%20 in%20Scientific%20Domain%20%28SCTB%29.
著作権等: This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s10579-022-09615-2
The full-text file will be made open to the public on 15 October 2023 in accordance with publisher's 'Terms and Conditions for Self-Archiving'.
This is not the published version. Please cite only the published version. この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。
URI: http://hdl.handle.net/2433/284725
DOI(出版社版): 10.1007/s10579-022-09615-2


