Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings

Soky, Kak; Li, Sheng; Chu, Chenhui; Kawahara, Tatsuya

このアイテムのアクセス数: 26

http://hdl.handle.net/2433/286875

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
s2717554523500248.pdf		281.89 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Soky, Kak	en
dc.contributor.author	Li, Sheng	en
dc.contributor.author	Chu, Chenhui	en
dc.contributor.author	Kawahara, Tatsuya	en
dc.contributor.alternative	褚, 晨翚	ja
dc.contributor.alternative	河原, 達也	ja
dc.date.accessioned	2024-02-06T01:07:05Z	-
dc.date.available	2024-02-06T01:07:05Z	-
dc.date.issued	2023-12	-
dc.identifier.uri	http://hdl.handle.net/2433/286875	-
dc.description.abstract	This study investigates the effective incorporation of meta-information such as domain and language in finetuning a pretrained model based on self-supervised learning (SSL) for automatic speech recognition (ASR) in very low-resource settings. SSL pretrained models have been shown to achieve comparable or even better performance to conventional end-to-end systems even when we finetune them with a small dataset. However, it still requires the specific target dataset with a considerable amount of labeled data, like 10 hours, to achieve satisfactory performance. Thus, we propose to exploit heterogeneous datasets which are partially matched either in language or domain and apply multi-task learning (MTL) or adversarial learning using the meta-information. The finetuning comprises (1) domain adaptation, which uses in-domain multi-lingual datasets, and (2) language adaptation, which uses datasets of the same language but different domains. The auxiliary task is domain identification for language adaptation and language identification for domain adaptation. We then embed the output of the auxiliary task into the encoder output of the ASR task. The target dataset is the Khmer corpus of ECCC (the Extraordinary Chambers in the Courts of Cambodia) in various sizes from one hour to 10 hours. The experimental evaluations demonstrate that fusing the meta-information in MTL or adversarial learning significantly improves ASR accuracy. Moreover, a two-step adaptation method which first conducts domain adaptation and then language adaptation is the most effective. We also show that the target labeled dataset of only 5 hours gives an almost saturated performance.	en
dc.language.iso	eng	-
dc.publisher	World Scientific Pub Co Pte Ltd	en
dc.rights	Electronic version of an article published as ’International Journal of Asian Language Processing, Vol. 33, No. 04, 2350024 (2023) © World Scientific Publishing Company, https://doi.org/10.1142/S2717554523500248	en
dc.rights	The full-text file will be made open to the public on 1 December 2024 in accordance with publisher's 'Terms and Conditions for Self-Archiving'.	en
dc.rights	This is not the published version. Please cite only the published version. この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。	en
dc.subject	Speech recognition	en
dc.subject	low-resource Khmer language	en
dc.subject	domain adaptation	en
dc.subject	language adaptation	en
dc.subject	meta information	en
dc.subject	multi-task learning	en
dc.subject	adversarial learning	en
dc.title	Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	International Journal of Asian Language Processing	en
dc.identifier.volume	33	-
dc.identifier.issue	04	-
dc.relation.doi	10.1142/s2717554523500248	-
dc.textversion	author	-
dc.identifier.artnum	2350024	-
dcterms.accessRights	embargoed access	-
datacite.date.available	2024-12-01	-
dc.identifier.pissn	2717-5545	-
dc.identifier.eissn	2424-791X	-
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。