Language Model and Speaking Rate Adaptation for Spontaneous Presentation Speech Recognition

Nanjo, H.; Kawahara, T.

ダウンロード数: 686

http://hdl.handle.net/2433/128905

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
TSA.2004.828641.pdf		425.35 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Nanjo, H.	en
dc.contributor.author	Kawahara, T.	en
dc.contributor.alternative	河原, 達也	ja
dc.date.accessioned	2010-10-21T01:56:07Z	-
dc.date.available	2010-10-21T01:56:07Z	-
dc.date.issued	2004-07	-
dc.identifier.issn	1063-6676	-
dc.identifier.uri	http://hdl.handle.net/2433/128905	-
dc.description.abstract	The paper addresses adaptation methods to language model and speaking rate (SR) of individual speakers which are two major problems in automatic transcription of spontaneous presentation speech. To cope with a large variation in expression and pronunciation of words depending on the speaker, firstly, we investigate the effect of statistical and context-dependent pronunciation modeling. Secondly, we present unsupervised methods of language model adaptation to a specific speaker and a topic by 1) selecting similar texts based on the word perplexity and TF-IDF measure and 2) making direct use of the initial recognition result for generating an enhanced model. We confirm that all proposed adaptation methods and their combinations reduce the perplexity and word error rate. We also present a decoding strategy adapted to the SR. In spontaneous speech, SR is generally fast and may vary a lot. We also observe different error tendencies for portions of presentations where speech is fast or slow. Therefore, we propose a SR-dependent decoding strategy that applies the most appropriate acoustic analysis, phone models, and decoding parameters according to the SR. Several methods are investigated and their selective application leads to improved accuracy. The combined effect of the two proposed adaptation methods is also confirmed in transcription of real academic presentation.	en
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	IEEE	en
dc.rights	© 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.	en
dc.title	Language Model and Speaking Rate Adaptation for Spontaneous Presentation Speech Recognition	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.ncid	AA10888994	-
dc.identifier.jtitle	IEEE Transactions on Speech and Audio Processing	en
dc.identifier.volume	12	-
dc.identifier.issue	4	-
dc.identifier.spage	391	-
dc.identifier.epage	400	-
dc.relation.doi	10.1109/TSA.2004.828641	-
dc.textversion	publisher	-
dcterms.accessRights	open access	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。