ダウンロード数: 230
このアイテムのファイル:
ファイル | 記述 | サイズ | フォーマット | |
---|---|---|---|---|
TASLP.2024.3407511.pdf | 3.77 MB | Adobe PDF | 見る/開く |
タイトル: | Waveform-domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition |
著者: | Shi, Hao Mimura, Masato Kawahara, Tatsuya |
著者名の別形: | 史, 昊 三村, 正人 河原, 達也 |
キーワード: | speech enhancement robust ASR time-frequency hybrid model spectral information refining |
発行日: | 2024 |
出版者: | Institute of Electrical and Electronics Engineers (IEEE) |
誌名: | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
巻: | 32 |
開始ページ: | 3049 |
終了ページ: | 3060 |
抄録: | While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this paper, we propose a waveform-spectrogram hybrid method (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature map by spectrogram encoding in each encoder layer. Incorporating spectral information provides robust human hearing experience performance. However, it has a minor automatic speech recognition (ASR) improvement. Thus, we improve it for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+) to both the SE front-end and ASR back-end. Experimental results using the CHiME-4 dataset show that ASR performance in real evaluation sets is consistently improved with the proposed method, which outperformed others, including DEMUCS and Conv-Tasnet. Refining in the shallow encoder layers is very effective, and the effect is confirmed even with a strong ASR baseline using WavLM. |
著作権等: | © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is not the published version. Please cite only the published version. この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。 |
URI: | http://hdl.handle.net/2433/287858 |
DOI(出版社版): | https://doi.org/10.1109/TASLP.2024.3407511 |
出現コレクション: | 学術雑誌掲載論文等 |
このリポジトリに保管されているアイテムはすべて著作権により保護されています。