Word-Region Alignment-Guided Multimodal Neural Machine Translation

Zhao, Yuting; Komachi, Mamoru; Kajiwara, Tomoyuki; Chu, Chenhui

ダウンロード数: 76

http://hdl.handle.net/2433/267448

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
TASLP.2021.3138719.pdf		5.63 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Zhao, Yuting	en
dc.contributor.author	Komachi, Mamoru	en
dc.contributor.author	Kajiwara, Tomoyuki	en
dc.contributor.author	Chu, Chenhui	en
dc.date.accessioned	2022-01-12T08:39:21Z	-
dc.date.available	2022-01-12T08:39:21Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://hdl.handle.net/2433/267448	-
dc.description.abstract	We propose word-region alignment-guided multimodal neural machine translation (MNMT), a novel model for MNMT that links the semantic correlation between textual and visual modalities using word-region alignment (WRA). Existing studies on MNMT have mainly focused on the effect of integrating visual and textual modalities. However, they do not leverage the semantic relevance between the two modalities. We advance the semantic correlation between textual and visual modalities in MNMT by incorporating WRA as a bridge. This proposal has been implemented on two mainstream architectures of neural machine translation (NMT): the recurrent neural network (RNN) and the transformer. Experiments on two public benchmarks, English--German and English--French translation tasks using the Multi30k dataset and English--Japanese translation tasks using the Flickr30kEnt-JP dataset prove that our model has a significant improvement with respect to the competitive baselines across different evaluation metrics and outperforms most of the existing MNMT models. For example, 1.0 BLEU scores are improved for the English-German task and 1.1 BLEU scores are improved for the English-French task on the Multi30k test2016 set; and 0.7 BLEU scores are improved for the English-Japanese task on the Flickr30kEnt-JP test set. Further analysis demonstrates that our model can achieve better translation performance by integrating WRA, leading to better visual information use.	en
dc.language.iso	eng	-
dc.publisher	IEEE	en
dc.rights	This work is licensed under a Creative Commons Attribution 4.0 License.	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	-
dc.subject	Graphics	en
dc.subject	Magnetization	en
dc.subject	Magnetostatics	en
dc.subject	Speech processing	en
dc.subject	Permeability	en
dc.subject	Image color analysis	en
dc.subject	Guidelines	en
dc.subject	Multi30k	en
dc.subject	multimodal machine translation	en
dc.subject	semantic correlation	en
dc.subject	vision and language	en
dc.subject	word-region alignment	en
dc.title	Word-Region Alignment-Guided Multimodal Neural Machine Translation	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	IEEE/ACM Transactions on Audio, Speech, and Language Processing	en
dc.identifier.volume	30	-
dc.identifier.spage	244	-
dc.identifier.epage	259	-
dc.relation.doi	10.1109/TASLP.2021.3138719	-
dc.textversion	publisher	-
dcterms.accessRights	open access	-
datacite.awardNumber	19K20343	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-19K20343/	-
dc.identifier.pissn	2329-9290	-
dc.identifier.eissn	2329-9304	-
jpcoar.funderName	日本学術振興会	ja
jpcoar.awardTitle	マルチモーダルデータからの対訳資源の抽出によるニューラル機械翻訳	ja
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス