Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

Shimada, Kazuki; Bando, Yoshiaki; Mimura, Masato; Itoyama, Katsutoshi; Yoshii, Kazuyoshi; Kawahara, Tatsuya

このアイテムのアクセス数: 386

http://hdl.handle.net/2433/240994

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
TASLP.2019.2907015.pdf		4.82 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Shimada, Kazuki	en
dc.contributor.author	Bando, Yoshiaki	en
dc.contributor.author	Mimura, Masato	en
dc.contributor.author	Itoyama, Katsutoshi	en
dc.contributor.author	Yoshii, Kazuyoshi	en
dc.contributor.author	Kawahara, Tatsuya	en
dc.contributor.alternative	吉井, 和佳	ja
dc.contributor.alternative	河原, 達也	ja
dc.date.accessioned	2019-04-23T07:47:24Z	-
dc.date.available	2019-04-23T07:47:24Z	-
dc.date.issued	2019-05	-
dc.identifier.issn	2329-9290	-
dc.identifier.issn	2329-9304	-
dc.identifier.uri	http://hdl.handle.net/2433/240994	-
dc.description.abstract	This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take a supervised approach that classifies each time-frequency (TF) bin into noise or speech by training a deep neural network (DNN). The performance of ASR, however, is degraded in an unknown noisy environment. To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF). This enables us to accurately estimate the SCMs of speech and noise not from observed noisy mixtures but from separated speech and noise components. In this paper, we propose online MVDR beamforming by effectively initializing and incrementally updating the parameters of MNMF. Another main contribution is to comprehensively investigate the performances of ASR obtained by various types of spatial filters, i.e., time-invariant and variant versions of MVDR beamformers and those of rank-1 and full-rank multichannel Wiener filters, in combination with MNMF. The experimental results showed that the proposed method outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.	en
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en
dc.rights	© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en
dc.rights	The full-text file will be made open to the public on 25 March 2021 in accordance with publisher's 'Terms and Conditions for Self-Archiving'.	en
dc.rights	この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。	ja
dc.rights	This is not the published version. Please cite only the published version.	en
dc.subject	Noisy speech recognition	en
dc.subject	speech enhancement	en
dc.subject	multichannel nonnegative matrix factorization	en
dc.subject	beamforming	en
dc.title	Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.ncid	AA12669539	-
dc.identifier.jtitle	IEEE/ACM Transactions on Audio, Speech, and Language Processing	en
dc.identifier.volume	27	-
dc.identifier.issue	5	-
dc.identifier.spage	960	-
dc.identifier.epage	971	-
dc.relation.doi	10.1109/TASLP.2019.2907015	-
dc.textversion	author	-
dc.address	Graduate School of Informatics, Kyoto University	en
dc.address	Graduate School of Informatics, Kyoto University	en
dc.address	Graduate School of Informatics, Kyoto University	en
dc.address	Graduate School of Informatics, Kyoto University	en
dc.address	Graduate School of Informatics, Kyoto University	en
dc.address	Graduate School of Informatics, Kyoto University	en
dcterms.accessRights	open access	-
datacite.date.available	2021-03-25	-
dc.identifier.pissn	2329-9290	-
dc.identifier.eissn	2329-9304	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。