Joint Chord and Key Estimation Based on a Hierarchical Variational Autoencoder with Multi-task Learning

Wu, Yiming; Yoshii, Kazuyoshi

このアイテムのアクセス数: 96

http://hdl.handle.net/2433/279280

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
116.00000052.pdf		2.63 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Wu, Yiming	en
dc.contributor.author	Yoshii, Kazuyoshi	en
dc.contributor.alternative	呉, 益明	ja
dc.contributor.alternative	吉井, 和佳	ja
dc.date.accessioned	2023-02-15T09:11:59Z	-
dc.date.available	2023-02-15T09:11:59Z	-
dc.date.issued	2022-06-21	-
dc.identifier.uri	http://hdl.handle.net/2433/279280	-
dc.description.abstract	This paper describes a deep generative approach to joint chord and key estimation for music signals. The limited amount of music signals with complete annotations has been the major bottleneck in supervised multi-task learning of a classification model. To overcome this limitation, we integrate the supervised multi-task learning approach with the unsupervised autoencoding approach in a mutually complementary manner. Considering the typical process of music composition, we formulate a hierarchical latent variable model that sequentially generates keys, chords, and chroma vectors. The keys and chords are assumed to follow a language model that represents their relationships and dynamics. In the framework of amortized variational inference (AVI), we introduce a classification model that jointly infers discrete chord and key labels and a recognition model that infers continuous latent features. These models are combined to form a variational autoencoder (VAE) and are trained jointly in a (semi-)supervised manner, where the generative and language models act as regularizers for the classification model. We comprehensively investigate three different architectures for the chord and key classification model, and three different architectures for the language model. Experimental results demonstrate that the VAE-based multi-task learning improves chord estimation as well as key estimation.	en
dc.language.iso	eng	-
dc.publisher	Now Publishers	en
dc.rights	© 2022 Y. Wu and K. Yoshii	en
dc.rights	This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence , which permits unrestricted re-use, distribution, and reproduction in any medium, for non-commercial use, provided the original work is properly cited.	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	-
dc.subject	Automatic chord estimation	en
dc.subject	automatic key estimation	en
dc.subject	variational autoencoder	en
dc.subject	multi-task learning	en
dc.title	Joint Chord and Key Estimation Based on a Hierarchical Variational Autoencoder with Multi-task Learning	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	APSIPA Transactions on Signal and Information Processing	en
dc.identifier.volume	11	-
dc.identifier.issue	1	-
dc.relation.doi	10.1561/116.00000052	-
dc.textversion	publisher	-
dc.identifier.artnum	e19	-
dcterms.accessRights	open access	-
datacite.awardNumber	16H01744	-
datacite.awardNumber	19H04137	-
datacite.awardNumber	19K20340	-
datacite.awardNumber	20K21813	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-16H01744/	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-19H04137/	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-19K20340/	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-20K21813/	-
dc.identifier.eissn	2048-7703	-
jpcoar.funderName	日本学術振興会	ja
jpcoar.funderName	日本学術振興会	ja
jpcoar.funderName	日本学術振興会	ja
jpcoar.funderName	日本学術振興会	ja
jpcoar.awardTitle	統計的文法理論と構成的意味論に基づく音楽理解の計算モデル	ja
jpcoar.awardTitle	認識・生成過程の統合に基づく視聴覚音楽理解	ja
jpcoar.awardTitle	統計学習と進化理論に基づく音楽創作の学習・進化の研究	ja
jpcoar.awardTitle	あらゆる音の定位・分離・分類のためのユニバーサル音響理解モデル	ja
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス