Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition

Akita, Yuya; Kawahara, Tatsuya

ダウンロード数: 529

http://hdl.handle.net/2433/128842

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
TASL.2009.2037400.pdf		577.33 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Akita, Yuya	en
dc.contributor.author	Kawahara, Tatsuya	en
dc.contributor.alternative	秋田, 祐哉	ja
dc.date.accessioned	2010-10-19T04:54:49Z	-
dc.date.available	2010-10-19T04:54:49Z	-
dc.date.issued	2010-08	-
dc.identifier.issn	1558-7916	-
dc.identifier.uri	http://hdl.handle.net/2433/128842	-
dc.description.abstract	We propose a novel approach based on a statistical transformation framework for language and pronunciation modeling of spontaneous speech. Since it is not practical to train a spoken-style model using numerous spoken transcripts, the proposed approach generates a spoken-style model by transforming an orthographic model trained with document archives such as the minutes of meetings and the proceedings of lectures. The transformation is based on a statistical model estimated using a small amount of a parallel corpus, which consists of faithful transcripts aligned with their orthographic documents. Patterns of transformation, such as substitution, deletion, and insertion of words, are extracted with their word and part-of-speech (POS) contexts, and transformation probabilities are estimated based on occurrence statistics in a parallel aligned corpus. For pronunciation modeling, subword-based mapping between baseforms and surface forms is extracted with their occurrence counts, then a set of rewrite rules with their probabilities are derived as a transformation model. Spoken-style language and pronunciation (surface forms) models can be predicted by applying these transformation patterns to a document-style language model and baseforms in a lexicon, respectively. The transformed models significantly reduced perplexity and word error rates (WERs) in a task of transcribing congressional meetings, even though the domains and topics were different from the parallel corpus. This result demonstrates the generality and portability of the proposed framework.	en
dc.language.iso	eng	-
dc.publisher	IEEE	en
dc.rights	(c) 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.	en
dc.title	Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.ncid	AA12103538	-
dc.identifier.jtitle	IEEE Transactions on Audio, Speech, and Language Processing	en
dc.identifier.volume	18	-
dc.identifier.issue	6	-
dc.identifier.spage	1539	-
dc.identifier.epage	1549	-
dc.relation.doi	10.1109/TASL.2009.2037400	-
dc.textversion	publisher	-
dcterms.accessRights	open access	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。