Audio-to-score singing transcription based on a CRNN-HSMM hybrid model

Nishikimi, Ryo; Nakamura, Eita; Goto, Masataka; Yoshii, Kazuyoshi

このアイテムのアクセス数: 12

http://hdl.handle.net/2433/293769

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
ATSIP.2021.4.pdf		1.5 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Nishikimi, Ryo	en
dc.contributor.author	Nakamura, Eita	en
dc.contributor.author	Goto, Masataka	en
dc.contributor.author	Yoshii, Kazuyoshi	en
dc.contributor.alternative	中村, 栄太	ja
dc.date.accessioned	2025-05-07T01:23:36Z	-
dc.date.available	2025-05-07T01:23:36Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://hdl.handle.net/2433/293769	-
dc.description.abstract	This paper describes an automatic singing transcription (AST) method that estimates a human-readable musical score of a sung melody from an input music signal. Because of the considerable pitch and temporal variation of a singing voice, a naive cascading approach that estimates an F0 contour and quantizes it with estimated tatum times cannot avoid many pitch and rhythm errors. To solve this problem, we formulate a unified generative model of a music signal that consists of a semi-Markov language model representing the generative process of latent musical notes conditioned on musical keys and an acoustic model based on a convolutional recurrent neural network (CRNN) representing the generative process of an observed music signal from the notes. The resulting CRNN-HSMM hybrid model enables us to estimate the most-likely musical notes from a music signal with the Viterbi algorithm, while leveraging both the grammatical knowledge about musical notes and the expressive power of the CRNN. The experimental results showed that the proposed method outperformed the conventional state-of-the-art method and the integration of the musical language model with the acoustic model has a positive effect on the AST performance.	en
dc.language.iso	eng	-
dc.publisher	Cambridge University Press (CUP)	en
dc.rights	© The Author(s), 2021. Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	-
dc.subject	Automatic singing transcription	en
dc.subject	Convolutional recurrent neural network	en
dc.subject	Hidden semi-Markov model	en
dc.title	Audio-to-score singing transcription based on a CRNN-HSMM hybrid model	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	APSIPA Transactions on Signal and Information Processing	en
dc.identifier.volume	10	-
dc.identifier.issue	1	-
dc.relation.doi	10.1017/atsip.2021.4	-
dc.textversion	publisher	-
dc.identifier.artnum	e7	-
dcterms.accessRights	open access	-
dc.identifier.eissn	2048-7703	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス