Shouted speech detection using hidden markov model with rahmonic and mel-frequency cepstrum coefficients

Fukumori, Takahiro; Nakayama, Masato; Nishiura, Takanobu; Nanjo, Hiroaki

ダウンロード数: 399

http://hdl.handle.net/2433/229399

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
1.4969503.pdf		113.16 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Fukumori, Takahiro	en
dc.contributor.author	Nakayama, Masato	en
dc.contributor.author	Nishiura, Takanobu	en
dc.contributor.author	Nanjo, Hiroaki	en
dc.contributor.alternative	南條, 浩輝	ja
dc.date.accessioned	2018-02-23T01:41:26Z	-
dc.date.available	2018-02-23T01:41:26Z	-
dc.date.issued	2016-10	-
dc.identifier.issn	0001-4966	-
dc.identifier.uri	http://hdl.handle.net/2433/229399	-
dc.description.abstract	In recent years, crime prevention systems have been developed to detect various hazardous situations. In general, the systems utilize the image information recorded by a camera to monitor the situations. It is however difficult to detect them in the blind area. To address the problem, it is required to utilize not only image information but also acoustic information occurred in such situations. Our previous study showed that two acoustic features including rahmonic and mel-frequency cepstrum coefficients (MFCCs) are effective for detecting the shouted speech. Rahmonic shows a subharmonic of fundamental frequency in the cepstrum domain, and MFCCs represent coefficients that collectively make up mel-frequency cepstrum. In this method, a shouted speech model is constructed from these features by using a gaussian mixture model (GMM). However, the previous method with GMM has difficulty in representing temporal changes of the speech features. In this study, we further expand the previous method using hidden Markov model (HMM) which has state transition to represent the temporal changes. Through objective experiments, the proposed method using HMM could achieve higher detection performance of the shouted speech than the conventional method using GMM.	en
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	Acoustical Society of America (ASA)	en
dc.rights	Copyright 2016 Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America. The following article appeared in 'The Journal of the Acoustical Society of America 140, 3057 (2016)' and may be found at https://doi.org/10.1121/1.4969503.	en
dc.rights	There are hidden parts depending on the permission condition of the publisher in this pdf.	en
dc.subject	Speech recognition	en
dc.subject	Markov processes	en
dc.subject	Automatic speech recognition systems	en
dc.subject	Image detection systems	en
dc.subject	Cameras	en
dc.title	Shouted speech detection using hidden markov model with rahmonic and mel-frequency cepstrum coefficients	en
dc.type	other	-
dc.type.niitype	Others	-
dc.identifier.jtitle	The Journal of the Acoustical Society of America	en
dc.identifier.volume	140	-
dc.identifier.issue	4	-
dc.identifier.spage	3057	-
dc.identifier.epage	3057	-
dc.relation.doi	10.1121/1.4969503	-
dc.textversion	publisher	-
dc.identifier.artnum	2aSPb7	-
dc.address	Ritsumeikan Univ.	en
dc.address	Ritsumeikan Univ.	en
dc.address	Ritsumeikan Univ.	en
dc.address	Kyoto Univ.	en
dcterms.accessRights	open access	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。