このアイテムのアクセス数: 91
このアイテムのファイル:
ファイル | 記述 | サイズ | フォーマット | |
---|---|---|---|---|
zenodo.7316742.pdf | 485.9 kB | Adobe PDF | 見る/開く |
完全メタデータレコード
DCフィールド | 値 | 言語 |
---|---|---|
dc.contributor.author | Deng, Tengyu | en |
dc.contributor.author | Nakamura, Eita | en |
dc.contributor.author | Yoshii, Kazuyoshi | en |
dc.contributor.alternative | 鄧, 腾煜 | ja |
dc.contributor.alternative | 中村, 栄太 | ja |
dc.contributor.alternative | 吉井, 和佳 | ja |
dc.date.accessioned | 2024-03-21T02:41:44Z | - |
dc.date.available | 2024-03-21T02:41:44Z | - |
dc.date.issued | 2022 | - |
dc.identifier.uri | http://hdl.handle.net/2433/287439 | - |
dc.description | International Society for Music Information Retrieval Conference (ISMIR 2022) , Bengaluru, India, December 4-8, 2022 | en |
dc.description.abstract | This paper presents an automatic lyrics transcription (ALT) method for music recordings that leverages the framewise semitone-level sung pitches estimated in a multi-task learning framework. Compared to automatic speech recognition (ASR), ALT is challenging due to the insufficiency of training data and the variation and contamination of acoustic features caused by singing expressions and accompaniment sounds. The domain adaptation approach has thus recently been taken for updating an ASR model pre-trained from sufficient speech data. In the naive application of the end-to-end approach to ALT, the internal audio-to-lyrics alignment often fails due to the time-stretching nature of singing features. To stabilize the alignment, we make use of the semi-synchronous relationships between notes and characters. Specifically, a convolutional recurrent neural network (CRNN) is used for estimating the semitone-level pitches with note onset times while eliminating the intra- and inter-note pitch variations. This estimate helps an end-to-end ALT model based on connectionist temporal classification (CTC) learn correct audio-to-character alignment and mapping, where the ALT model is trained jointly with the pitch and onset estimation model. The experimental results show the usefulness of the pitch and onset information in ALT. | en |
dc.language.iso | eng | - |
dc.publisher | ISMIR | en |
dc.rights | © T. Deng, E. Nakamura, and K. Yoshii. | en |
dc.rights | Creative Commons Attribution 4.0 International | en |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | - |
dc.subject | ismir | en |
dc.subject | ismir2022 | en |
dc.title | End-to-End Lyrics Transcription Informed by Pitch and Onset Estimation | en |
dc.type | conference paper | - |
dc.type.niitype | Conference Paper | - |
dc.identifier.jtitle | Proceedings of the 23rd International Society for Music Information Retrieval Conference | en |
dc.identifier.spage | 633 | - |
dc.identifier.epage | 639 | - |
dc.relation.doi | 10.5281/zenodo.7316742 | - |
dc.textversion | publisher | - |
dcterms.accessRights | open access | - |
datacite.awardNumber | 19H04137 | - |
datacite.awardNumber | 20K21813 | - |
datacite.awardNumber | 21K02846 | - |
datacite.awardNumber | 21K12187 | - |
datacite.awardNumber | 22H03661 | - |
datacite.awardNumber.uri | https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-19H04137/ | - |
datacite.awardNumber.uri | https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-20K21813/ | - |
datacite.awardNumber.uri | https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21K02846/ | - |
datacite.awardNumber.uri | https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21K12187/ | - |
datacite.awardNumber.uri | https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-22H03661/ | - |
jpcoar.funderName | 日本学術振興会 | ja |
jpcoar.funderName | 日本学術振興会 | ja |
jpcoar.funderName | 日本学術振興会 | ja |
jpcoar.funderName | 日本学術振興会 | ja |
jpcoar.funderName | 日本学術振興会 | ja |
jpcoar.awardTitle | 認識・生成過程の統合に基づく視聴覚音楽理解 | ja |
jpcoar.awardTitle | あらゆる音の定位・分離・分類のためのユニバーサル音響理解モデル | ja |
jpcoar.awardTitle | ピアノ演奏技能の習得 --その身体知の獲得過程モデル作成と習得支援の研究 | ja |
jpcoar.awardTitle | 自動楽曲推薦・編曲とタテ線譜・自動伴奏システムによる中高齢者のピアノ演奏支援 | ja |
jpcoar.awardTitle | 深層・統計学習と非平衡系物理の理論に基づく文化と知能の進化モデルの研究 | ja |
出現コレクション: | 学術雑誌掲載論文等 |

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス