Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

Mimura, Masato; Sakai, Shinsuke; Kawahara, Tatsuya

このアイテムのアクセス数: 403

http://hdl.handle.net/2433/201887

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
s13634-015-0246-6.pdf		2.11 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Mimura, Masato	en
dc.contributor.author	Sakai, Shinsuke	en
dc.contributor.author	Kawahara, Tatsuya	en
dc.contributor.alternative	三村, 正人	ja
dc.date.accessioned	2015-11-24T07:34:53Z	-
dc.date.available	2015-11-24T07:34:53Z	-
dc.date.issued	2015-07-23	-
dc.identifier.issn	1687-6172	-
dc.identifier.uri	http://hdl.handle.net/2433/201887	-
dc.description.abstract	We propose an approach to reverberant speech recognition adopting deep learning in the front-end as well as b a c k-e n d o f a r e v e r b e r a n t s p e e c h r e c o g n i t i o n s y s t e m, a n d a n o v e l m e t h o d t o i m p r o v e t h e d e r e v e r b e r a t i o n p e r f o r m a n c e of the front-end network using phone-class information. At the front-end, we adopt a deep autoencoder (DAE) for enhancing the speech feature parameters, and speech recognition is performed in the back-end using DNN-HMM acoustic models trained on multi-condition data. The system was evaluated through the ASR task in the Reverb Challenge 2014. The DNN-HMM system trained on the multi-condition training set achieved a conspicuously higher word accuracy compared to the MLLR-adapted GMM-HMM system trained on the same data. Furthermore, feature enhancement with the deep autoencoder contributed to the improvement of recognition accuracy especially in the more adverse conditions. While the mapping between reverberant and clean speech in DAE-based dereverberation is conventionally conducted only with the acoustic information, we presume the mapping is also dependent on the phone information. Therefore, we propose a new scheme (pDAE), which augments a phone-class feature to the standard acoustic features as input. Two types of the phone-class feature are investigated. One is the hard recognition result of monophones, and the other is a soft representation derived from the posterior outputs of monophone DNN. The augmented feature in either type results in a significant improvement (7–8 % relative) from the standard DAE.	en
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	SpringerOpen	en
dc.rights	© 2015 Mimura et al.	en
dc.rights	This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.	en
dc.subject	Reverberant speech recognition	en
dc.subject	Deep Neural Networks (DNN)	en
dc.subject	Deep Autoencoder (DAE)	en
dc.title	Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	EURASIP Journal on Advances in Signal Processing	en
dc.identifier.volume	2015	-
dc.identifier.issue	1	-
dc.relation.doi	10.1186/s13634-015-0246-6	-
dc.textversion	publisher	-
dc.identifier.artnum	62	-
dcterms.accessRights	open access	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。