Rahmonicとメルケプストラムを用いた音響モデルに基づく騒音環境下叫び声検出の性能評価

福森, 隆寛; 中山, 雅人; 西浦, 敬信; 南條, 浩輝

このアイテムのアクセス数: 208

http://hdl.handle.net/2433/228957

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
IEICE_tec.rep_SP2016-127.pdf		344.76 kB	Adobe PDF	見る/開く

タイトル:	Rahmonicとメルケプストラムを用いた音響モデルに基づく騒音環境下叫び声検出の性能評価
その他のタイトル:	Performance evaluation of noisy shouted speech detection based on acoustic model with rahmonic and mel-frequency cepstrum coefficients
著者:	福森, 隆寛中山, 雅人西浦, 敬信南條, 浩輝
著者名の別形:	Fukumori, Takahiro Nakayama, Masato Nishiura, Takanobu Nanjo, Hiroaki
キーワード:	叫び声検出騒音環境 Rahmonic メルケプストラム Shouted speech detection Noisy environment Rahmonic Mel-frequency cepstrum coefficients
発行日:	Mar-2017
出版者:	電子情報通信学会
誌名:	信学技報
巻:	116
号:	477
開始ページ:	283
終了ページ:	286
論文番号:	SP2016-127
抄録:	本稿では, 騒音環境下におけるRahmonicとメルケプストラム(Mel-Frequency Cepstrum Coefficients: MFCC)を用いた叫び声検出手法について述べる. MFCCは人間の聴覚特性を考慮したケプストラム係数であり, 音韻を特定するための声道特徴量を示している. またRahmonicは, 基本周波数の低調波成分であり, 人間の声帯運動に関わる特徴を表現している. これまで, 我々は大量の平静音声と叫び声から抽出したMFCCとRahmonicに基づいて構築したGaussian Mixture Model(GMM)を用いて叫び声を検出していた. 本稿では, この音響モデルをHidden Markov Model(HMM)やDeep Neural Network(DNN)に拡張して騒音環境下での叫び声検出性能を評価した. 評価実験の結果, 叫び声の発声機構(声道特性と声帯特性)をMFCCとRahmonicを用いて効率よく表現できることが確認できた. 加えて, ほとんどの騒音環境において音響モデルとしてDNNを用いることでGMMやHMMよりも高い叫び声検出性能を達成できた. This paper describes a method based on new combined features with mel-frequency cepstrum coefficients (MFCCs) and rahmonic in order to robustly detect a shouted speech in noisy environments. MFCCs collectively make up mel-frequency cepstrum, and rahmonic shows a subharmonic of fundamental frequency in the cepstrum domain. In our previous method, Gaussian mixture models (GMM) is constructed with the proposed features extracted from training data which includes a lot of normal and shouted speech samples. In this paper, evaluation experiments of noisy shouted speech detection were conducted using not only GMM but also hidden Markov models (HMM) and deep neural network (DNN). The results show that MFCCs and rahmonic were effective for representing an utterance mechanism including both vocal tract and vocal cords. In addition, DNN could achieve higher performance in noisy environments than GMM and HMM.
著作権等:	Copyright © 2017 by IEICE
URI:	http://hdl.handle.net/2433/228957
関連リンク:	http://www.ieice.org/ken/paper/201703028bSI/
出現コレクション:	学術雑誌掲載論文等

アイテムの詳細レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。