ダウンロード数: 32

このアイテムのファイル:
ファイル 記述 サイズフォーマット 
zenodo.7316742.pdf485.9 kBAdobe PDF見る/開く
完全メタデータレコード
DCフィールド言語
dc.contributor.authorDeng, Tengyuen
dc.contributor.authorNakamura, Eitaen
dc.contributor.authorYoshii, Kazuyoshien
dc.contributor.alternative鄧, 腾煜ja
dc.contributor.alternative中村, 栄太ja
dc.contributor.alternative吉井, 和佳ja
dc.date.accessioned2024-03-21T02:41:44Z-
dc.date.available2024-03-21T02:41:44Z-
dc.date.issued2022-
dc.identifier.urihttp://hdl.handle.net/2433/287439-
dc.descriptionInternational Society for Music Information Retrieval Conference (ISMIR 2022) , Bengaluru, India, December 4-8, 2022en
dc.description.abstractThis paper presents an automatic lyrics transcription (ALT) method for music recordings that leverages the framewise semitone-level sung pitches estimated in a multi-task learning framework. Compared to automatic speech recognition (ASR), ALT is challenging due to the insufficiency of training data and the variation and contamination of acoustic features caused by singing expressions and accompaniment sounds. The domain adaptation approach has thus recently been taken for updating an ASR model pre-trained from sufficient speech data. In the naive application of the end-to-end approach to ALT, the internal audio-to-lyrics alignment often fails due to the time-stretching nature of singing features. To stabilize the alignment, we make use of the semi-synchronous relationships between notes and characters. Specifically, a convolutional recurrent neural network (CRNN) is used for estimating the semitone-level pitches with note onset times while eliminating the intra- and inter-note pitch variations. This estimate helps an end-to-end ALT model based on connectionist temporal classification (CTC) learn correct audio-to-character alignment and mapping, where the ALT model is trained jointly with the pitch and onset estimation model. The experimental results show the usefulness of the pitch and onset information in ALT.en
dc.language.isoeng-
dc.publisherISMIRen
dc.rights© T. Deng, E. Nakamura, and K. Yoshii.en
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectismiren
dc.subjectismir2022en
dc.titleEnd-to-End Lyrics Transcription Informed by Pitch and Onset Estimationen
dc.typeconference paper-
dc.type.niitypeConference Paper-
dc.identifier.jtitleProceedings of the 23rd International Society for Music Information Retrieval Conferenceen
dc.identifier.spage633-
dc.identifier.epage639-
dc.relation.doi10.5281/zenodo.7316742-
dc.textversionpublisher-
dcterms.accessRightsopen access-
datacite.awardNumber19H04137-
datacite.awardNumber20K21813-
datacite.awardNumber21K02846-
datacite.awardNumber21K12187-
datacite.awardNumber22H03661-
datacite.awardNumber.urihttps://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-19H04137/-
datacite.awardNumber.urihttps://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-20K21813/-
datacite.awardNumber.urihttps://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21K02846/-
datacite.awardNumber.urihttps://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21K12187/-
datacite.awardNumber.urihttps://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-22H03661/-
jpcoar.funderName日本学術振興会ja
jpcoar.funderName日本学術振興会ja
jpcoar.funderName日本学術振興会ja
jpcoar.funderName日本学術振興会ja
jpcoar.funderName日本学術振興会ja
jpcoar.awardTitle認識・生成過程の統合に基づく視聴覚音楽理解ja
jpcoar.awardTitleあらゆる音の定位・分離・分類のためのユニバーサル音響理解モデルja
jpcoar.awardTitleピアノ演奏技能の習得 --その身体知の獲得過程モデル作成と習得支援の研究ja
jpcoar.awardTitle自動楽曲推薦・編曲とタテ線譜・自動伴奏システムによる中高齢者のピアノ演奏支援ja
jpcoar.awardTitle深層・統計学習と非平衡系物理の理論に基づく文化と知能の進化モデルの研究ja
出現コレクション:学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks


出力フォーマット 


このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス Creative Commons