このアイテムのアクセス数: 12

このアイテムのファイル:
ファイル 記述 サイズフォーマット 
ATSIP.2021.4.pdf1.5 MBAdobe PDF見る/開く
完全メタデータレコード
DCフィールド言語
dc.contributor.authorNishikimi, Ryoen
dc.contributor.authorNakamura, Eitaen
dc.contributor.authorGoto, Masatakaen
dc.contributor.authorYoshii, Kazuyoshien
dc.contributor.alternative中村, 栄太ja
dc.date.accessioned2025-05-07T01:23:36Z-
dc.date.available2025-05-07T01:23:36Z-
dc.date.issued2021-
dc.identifier.urihttp://hdl.handle.net/2433/293769-
dc.description.abstractThis paper describes an automatic singing transcription (AST) method that estimates a human-readable musical score of a sung melody from an input music signal. Because of the considerable pitch and temporal variation of a singing voice, a naive cascading approach that estimates an F0 contour and quantizes it with estimated tatum times cannot avoid many pitch and rhythm errors. To solve this problem, we formulate a unified generative model of a music signal that consists of a semi-Markov language model representing the generative process of latent musical notes conditioned on musical keys and an acoustic model based on a convolutional recurrent neural network (CRNN) representing the generative process of an observed music signal from the notes. The resulting CRNN-HSMM hybrid model enables us to estimate the most-likely musical notes from a music signal with the Viterbi algorithm, while leveraging both the grammatical knowledge about musical notes and the expressive power of the CRNN. The experimental results showed that the proposed method outperformed the conventional state-of-the-art method and the integration of the musical language model with the acoustic model has a positive effect on the AST performance.en
dc.language.isoeng-
dc.publisherCambridge University Press (CUP)en
dc.rights© The Author(s), 2021. Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/-
dc.subjectAutomatic singing transcriptionen
dc.subjectConvolutional recurrent neural networken
dc.subjectHidden semi-Markov modelen
dc.titleAudio-to-score singing transcription based on a CRNN-HSMM hybrid modelen
dc.typejournal article-
dc.type.niitypeJournal Article-
dc.identifier.jtitleAPSIPA Transactions on Signal and Information Processingen
dc.identifier.volume10-
dc.identifier.issue1-
dc.relation.doi10.1017/atsip.2021.4-
dc.textversionpublisher-
dc.identifier.artnume7-
dcterms.accessRightsopen access-
dc.identifier.eissn2048-7703-
出現コレクション:学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks


出力フォーマット 


このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス Creative Commons