「漢字情報学の構築」共同研究班報告

安岡, 孝一

このアイテムのアクセス数: 555

https://doi.org/10.14989/88023

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
jic083_349.pdf		863.88 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	安岡, 孝一	ja
dc.contributor.alternative	YASUOKA, Koichi	en
dc.contributor.transcription	ヤスオカ, コウイチ	ja-Kana
dc.date.accessioned	2009-12-07T06:45:17Z	-
dc.date.available	2009-12-07T06:45:17Z	-
dc.date.issued	2008-09-25	-
dc.identifier.issn	0304-2448	-
dc.identifier.uri	http://hdl.handle.net/2433/88023	-
dc.description.abstract	This is a report of the proceedings of the research seminar "Constructing Kanji (漢字) Informatics", which was held from 2004 to 2008, coordinated by Yasuoka Koichi. The seminar started out with considering a hierarchical model for representing digital text using a model consisting of four layers as follows : image layer, text layer, syntax layer and semantic layer. To better understand the relationship of the image and text layer, we spent some time analyzing and trying to understand the rules for vertical layout of complex text in Japanese and other East Asian languages, including the handling of pronounciation guides (so called 'ruby') The next step was to invert the direction and try to identify characters on the image representation of a text, in the same way an optical character recognition program procededes. This turned out to be not so easy, especially with stone rubbings that exhibit a irregular layout of the characters, but worked reasonably well for characters in a regular grid. In moving to the syntactic and semantic layer, the final topic for the seminar was to consider methods for adding punctuation marks (dots) to a Chinese text without any punctuation. After trying a number of different statistical approaches, like looking at characters that appear before or after punctuation dots in already punctuated texts, 2-grams, or even rhyme patterns it became evident that a purely statistical approach would not give the desired results, but that it was necessary to also to take grammatical relations into account. The most promising approach in this respect seemed to be use text with reading marks for kanbun, which do provide some basic grammatical annotation. It was therefore decided to devote a follow up seminar to the development of a corpus of kanbun annotated text that could be used as training and test material for morphological and syntactical parsers.	en
dc.format.mimetype	application/pdf	-
dc.language.iso	jpn	-
dc.publisher	京都大學人文科學研究所	ja
dc.publisher.alternative	Institute for Research in Humanities, Kyoto University	en
dc.subject.ndc	220	-
dc.title	「漢字情報学の構築」共同研究班報告	ja
dc.title.alternative	Report on the Research Seminar "Constructing Kanji (漢字) Informatics	en
dc.type	departmental bulletin paper	-
dc.type.niitype	Departmental Bulletin Paper	-
dc.identifier.ncid	AN00167025	-
dc.identifier.jtitle	東方學報	ja
dc.identifier.volume	83	-
dc.identifier.spage	349	-
dc.identifier.epage	360	-
dc.textversion	publisher	-
dc.sortkey	04	-
dc.identifier.selfDOI	10.14989/88023	-
dcterms.accessRights	open access	-
dc.identifier.pissn	0304-2448	-
dc.identifier.jtitle-alternative	The Tôhô Gakuhô : Journal of Oriental Studies	en
出現コレクション:	第83册

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。