ダウンロード数: 32
このアイテムのファイル:
ファイル | 記述 | サイズ | フォーマット | |
---|---|---|---|---|
zenodo.7316742.pdf | 485.9 kB | Adobe PDF | 見る/開く |
タイトル: | End-to-End Lyrics Transcription Informed by Pitch and Onset Estimation |
著者: | Deng, Tengyu Nakamura, Eita https://orcid.org/0000-0003-4097-6027 (unconfirmed) Yoshii, Kazuyoshi https://orcid.org/0000-0001-8387-8609 (unconfirmed) |
著者名の別形: | 鄧, 腾煜 中村, 栄太 吉井, 和佳 |
キーワード: | ismir ismir2022 |
発行日: | 2022 |
出版者: | ISMIR |
誌名: | Proceedings of the 23rd International Society for Music Information Retrieval Conference |
開始ページ: | 633 |
終了ページ: | 639 |
抄録: | This paper presents an automatic lyrics transcription (ALT) method for music recordings that leverages the framewise semitone-level sung pitches estimated in a multi-task learning framework. Compared to automatic speech recognition (ASR), ALT is challenging due to the insufficiency of training data and the variation and contamination of acoustic features caused by singing expressions and accompaniment sounds. The domain adaptation approach has thus recently been taken for updating an ASR model pre-trained from sufficient speech data. In the naive application of the end-to-end approach to ALT, the internal audio-to-lyrics alignment often fails due to the time-stretching nature of singing features. To stabilize the alignment, we make use of the semi-synchronous relationships between notes and characters. Specifically, a convolutional recurrent neural network (CRNN) is used for estimating the semitone-level pitches with note onset times while eliminating the intra- and inter-note pitch variations. This estimate helps an end-to-end ALT model based on connectionist temporal classification (CTC) learn correct audio-to-character alignment and mapping, where the ALT model is trained jointly with the pitch and onset estimation model. The experimental results show the usefulness of the pitch and onset information in ALT. |
記述: | International Society for Music Information Retrieval Conference (ISMIR 2022) , Bengaluru, India, December 4-8, 2022 |
著作権等: | © T. Deng, E. Nakamura, and K. Yoshii. Creative Commons Attribution 4.0 International |
URI: | http://hdl.handle.net/2433/287439 |
DOI(出版社版): | 10.5281/zenodo.7316742 |
出現コレクション: | 学術雑誌掲載論文等 |
このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス