ダウンロード数: 76

このアイテムのファイル:
ファイル 記述 サイズフォーマット 
TASLP.2021.3138719.pdf5.63 MBAdobe PDF見る/開く
完全メタデータレコード
DCフィールド言語
dc.contributor.authorZhao, Yutingen
dc.contributor.authorKomachi, Mamoruen
dc.contributor.authorKajiwara, Tomoyukien
dc.contributor.authorChu, Chenhuien
dc.date.accessioned2022-01-12T08:39:21Z-
dc.date.available2022-01-12T08:39:21Z-
dc.date.issued2022-
dc.identifier.urihttp://hdl.handle.net/2433/267448-
dc.description.abstractWe propose word-region alignment-guided multimodal neural machine translation (MNMT), a novel model for MNMT that links the semantic correlation between textual and visual modalities using word-region alignment (WRA). Existing studies on MNMT have mainly focused on the effect of integrating visual and textual modalities. However, they do not leverage the semantic relevance between the two modalities. We advance the semantic correlation between textual and visual modalities in MNMT by incorporating WRA as a bridge. This proposal has been implemented on two mainstream architectures of neural machine translation (NMT): the recurrent neural network (RNN) and the transformer. Experiments on two public benchmarks, English--German and English--French translation tasks using the Multi30k dataset and English--Japanese translation tasks using the Flickr30kEnt-JP dataset prove that our model has a significant improvement with respect to the competitive baselines across different evaluation metrics and outperforms most of the existing MNMT models. For example, 1.0 BLEU scores are improved for the English-German task and 1.1 BLEU scores are improved for the English-French task on the Multi30k test2016 set; and 0.7 BLEU scores are improved for the English-Japanese task on the Flickr30kEnt-JP test set. Further analysis demonstrates that our model can achieve better translation performance by integrating WRA, leading to better visual information use.en
dc.language.isoeng-
dc.publisherIEEEen
dc.rightsThis work is licensed under a Creative Commons Attribution 4.0 License.en
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectGraphicsen
dc.subjectMagnetizationen
dc.subjectMagnetostaticsen
dc.subjectSpeech processingen
dc.subjectPermeabilityen
dc.subjectImage color analysisen
dc.subjectGuidelinesen
dc.subjectMulti30ken
dc.subjectmultimodal machine translationen
dc.subjectsemantic correlationen
dc.subjectvision and languageen
dc.subjectword-region alignmenten
dc.titleWord-Region Alignment-Guided Multimodal Neural Machine Translationen
dc.typejournal article-
dc.type.niitypeJournal Article-
dc.identifier.jtitleIEEE/ACM Transactions on Audio, Speech, and Language Processingen
dc.identifier.volume30-
dc.identifier.spage244-
dc.identifier.epage259-
dc.relation.doi10.1109/TASLP.2021.3138719-
dc.textversionpublisher-
dcterms.accessRightsopen access-
datacite.awardNumber19K20343-
datacite.awardNumber.urihttps://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-19K20343/-
dc.identifier.pissn2329-9290-
dc.identifier.eissn2329-9304-
jpcoar.funderName日本学術振興会ja
jpcoar.awardTitleマルチモーダルデータからの対訳資源の抽出によるニューラル機械翻訳ja
出現コレクション:学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks


出力フォーマット 


このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス Creative Commons