The Semantic Typology of Visually Grounded Paraphrases

Chu, Chenhui; Oliveira, Vinicius; Virgo, Giovanni, Felix; Otani, Mayu; Garcia, Noa; Nakashima, Yuta

このアイテムのアクセス数: 179

http://hdl.handle.net/2433/266704

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
j.cviu.2021.103333.pdf		2.01 MB	Adobe PDF	見る/開く

タイトル:	The Semantic Typology of Visually Grounded Paraphrases
著者:	Chu, Chenhui https://orcid.org/0000-0001-9848-6384 (unconfirmed) Oliveira, Vinicius Virgo, Giovanni, Felix Otani, Mayu Garcia, Noa Nakashima, Yuta
キーワード:	Vision and language Image interpretation Visual grounded paraphrases Semantic typology Dataset
発行日:	Jan-2022
出版者:	Elsevier
誌名:	Computer Vision and Image Understanding
巻:	215
論文番号:	103333
抄録:	Visually grounded paraphrases (VGPs) are different phrasal expressions describing the same visual concept in an image. Previous studies treat VGP identification as a binary classification task, which ignores various phenomena behind VGPs (i.e., different linguistic interpretation of the same visual concept) such as linguistic paraphrases and VGPs from different aspects. In this paper, we propose semantic typology for VGPs, aiming to elucidate the VGP phenomena and deepen the understanding about how human beings interpret vision with language. We construct a large VGP dataset that annotates the class to which each VGP pair belongs according to our typology. In addition, we present a classification model that fuses language and visual features for VGP classification on our dataset. Experiments indicate that joint language and vision representation learning is important for VGP classification. We further demonstrate that our VGP typology can boost the performance of visually grounded textual entailment.
著作権等:	© 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
URI:	http://hdl.handle.net/2433/266704
DOI(出版社版):	10.1016/j.cviu.2021.103333
出現コレクション:	学術雑誌掲載論文等

アイテムの詳細レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス