アイヌ民話アーカイブに対する音声認識

松浦, 孝平; 三村, 正人; 河原, 達也

このアイテムのアクセス数: 88

http://hdl.handle.net/2433/277787

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
jnlp.28.824.pdf		556.08 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	松浦, 孝平	ja
dc.contributor.author	三村, 正人	ja
dc.contributor.author	河原, 達也	ja
dc.contributor.alternative	Matsuura, Kohei	en
dc.contributor.alternative	Mimura, Masato	en
dc.contributor.alternative	Kawahara, Tatsuya	en
dc.date.accessioned	2022-12-13T02:32:54Z	-
dc.date.available	2022-12-13T02:32:54Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://hdl.handle.net/2433/277787	-
dc.description.abstract	本稿では，アイヌ民話（ウウェペケㇾ）の音声認識に関する我々の取り組みについて述べる．まず，2 つの博物館から提供されたアイヌ語アーカイブのデータを元に，沙流方言を対象としたアイヌ語音声コーパスを構築した．次に，このコーパスを用いて注意機構モデルに基づく音声認識システムを構成し，音素・音節・ワードピース・単語の 4 つの認識単位について検討した．その結果，音節単位での音声認識精度が最も高くなることがわかり，話者クローズド条件と話者オープン条件のそれぞれについて，音素認識精度で 93.7% と 86.2%，単語認識精度で 78.3% と 61.4% を実現した．音声認識精度が話者オープン条件において大幅に低下する問題に対して，CycleGAN を用いた教師なし話者適応を提案した．これは，学習データ内の話者の音声から認識対象話者の音声への写像を CycleGAN に学習させ，学習データ内の音声を全て認識対象話者風の音声に変換するものである．本手法によって最大で相対 60.6% の音素誤り率の改善を得た．さらに，日本語とアイヌ語が混合した音声における言語識別についても検討を行い，音素認識と単語認識を用いた構成で一定の識別性能を達成できることを示した．	ja
dc.description.abstract	In this article, our work on the speech recognition of Ainu folklores (Uwepeker) is described. First, we constructed an Ainu speech corpus for the Saru dialect based on the data provided by two museums that had constructed the Ainu archive. Next, we built an automatic speech recognition (ASR) system based on an attention-based encoder-decoder model, and compared four recognition units of phones, syllables, word pieces, and words. With the syllable unit, we achieved a phone recognition accuracy of 93.7% and 86.2%, and word recognition accuracy of 78.3% and 61.4% for the speaker-closed and speaker-open conditions, respectively. To address the problem of significant degradation in the speaker-open condition, an unsupervised speaker adaptation method using a CycleGAN is proposed. In this method, mapping of the speaker’s voice in the training data to the target speaker’s voice is learned by a CycleGAN, that converts all speech in the training data into the target speaker’s speech. This method reduced the phone error rate by up to 60.6%. In addition, we investigated language identification in Japanese and Ainu mixed speech and realized reasonable performance by cascading phone and word recognition modules.	en
dc.language.iso	jpn	-
dc.publisher	言語処理学会	ja
dc.publisher.alternative	Association for Natural Language Processing	en
dc.rights	© 2021 一般社団法人　言語処理学会	ja
dc.rights	Licensed under CC BY 4.0	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	-
dc.subject	音声認識	ja
dc.subject	言語識別	ja
dc.subject	アイヌ語	ja
dc.subject	Automatic Speech Recognition	en
dc.subject	Language Identification	en
dc.subject	Ainu	en
dc.title	アイヌ民話アーカイブに対する音声認識	ja
dc.title.alternative	Automatic Speech Recognition for the Archive of Ainu Folklores	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	自然言語処理	ja
dc.identifier.volume	28	-
dc.identifier.issue	3	-
dc.identifier.spage	824	-
dc.identifier.epage	846	-
dc.relation.doi	10.5715/jnlp.28.824	-
dc.textversion	publisher	-
dcterms.accessRights	open access	-
dc.identifier.pissn	1340-7619	-
dc.identifier.eissn	2185-8314	-
dc.identifier.jtitle-alternative	Journal of Natural Language Processing	en
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス