このアイテムのアクセス数: 391

このアイテムのファイル:
ファイル 記述 サイズフォーマット 
jic083_349.pdf863.88 kBAdobe PDF見る/開く
完全メタデータレコード
DCフィールド言語
dc.contributor.author安岡, 孝一ja
dc.contributor.alternativeYASUOKA, Koichija
dc.contributor.transcriptionヤスオカ, コウイチja
dc.date.accessioned2009-12-07T06:45:17Z-
dc.date.available2009-12-07T06:45:17Z-
dc.date.issued2008-09-25ja
dc.identifier.issn0304-2448ja
dc.identifier.urihttp://hdl.handle.net/2433/88023-
dc.description.abstractThis is a report of the proceedings of the research seminar "Constructing Kanji (漢字) Informatics", which was held from 2004 to 2008, coordinated by Yasuoka Koichi. The seminar started out with considering a hierarchical model for representing digital text using a model consisting of four layers as follows : image layer, text layer, syntax layer and semantic layer. To better understand the relationship of the image and text layer, we spent some time analyzing and trying to understand the rules for vertical layout of complex text in Japanese and other East Asian languages, including the handling of pronounciation guides (so called 'ruby') The next step was to invert the direction and try to identify characters on the image representation of a text, in the same way an optical character recognition program procededes. This turned out to be not so easy, especially with stone rubbings that exhibit a irregular layout of the characters, but worked reasonably well for characters in a regular grid. In moving to the syntactic and semantic layer, the final topic for the seminar was to consider methods for adding punctuation marks (dots) to a Chinese text without any punctuation. After trying a number of different statistical approaches, like looking at characters that appear before or after punctuation dots in already punctuated texts, 2-grams, or even rhyme patterns it became evident that a purely statistical approach would not give the desired results, but that it was necessary to also to take grammatical relations into account. The most promising approach in this respect seemed to be use text with reading marks for kanbun, which do provide some basic grammatical annotation. It was therefore decided to devote a follow up seminar to the development of a corpus of kanbun annotated text that could be used as training and test material for morphological and syntactical parsers.ja
dc.format.mimetypeapplication/pdfja
dc.language.isojpnja
dc.publisher京都大學人文科學研究所ja
dc.subject.ndc220ja
dc.title「漢字情報学の構築」共同研究班報告ja
dc.title.alternativeReport on the Research Seminar "Constructing Kanji (漢字) Informaticsja
dc.type.niitypeDepartmental Bulletin Paperja
dc.identifier.ncidAN00167025ja
dc.identifier.jtitle東方學報ja
dc.identifier.volume83ja
dc.identifier.spage349ja
dc.identifier.epage360ja
dc.textversionpublisherja
dc.sortkey04ja
dc.identifier.selfDOI10.14989/88023ja
出現コレクション:第83册

アイテムの簡略レコードを表示する

Export to RefWorks


出力フォーマット 


このリポジトリに保管されているアイテムはすべて著作権により保護されています。