The Semantic Typology of Visually Grounded Paraphrases

Chu, Chenhui; Oliveira, Vinicius; Virgo, Giovanni, Felix; Otani, Mayu; Garcia, Noa; Nakashima, Yuta

このアイテムのアクセス数: 179

http://hdl.handle.net/2433/266704

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
j.cviu.2021.103333.pdf		2.01 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Chu, Chenhui	en
dc.contributor.author	Oliveira, Vinicius	en
dc.contributor.author	Virgo, Giovanni, Felix	en
dc.contributor.author	Otani, Mayu	en
dc.contributor.author	Garcia, Noa	en
dc.contributor.author	Nakashima, Yuta	en
dc.date.accessioned	2021-12-23T09:37:34Z	-
dc.date.available	2021-12-23T09:37:34Z	-
dc.date.issued	2022-01	-
dc.identifier.uri	http://hdl.handle.net/2433/266704	-
dc.description.abstract	Visually grounded paraphrases (VGPs) are different phrasal expressions describing the same visual concept in an image. Previous studies treat VGP identification as a binary classification task, which ignores various phenomena behind VGPs (i.e., different linguistic interpretation of the same visual concept) such as linguistic paraphrases and VGPs from different aspects. In this paper, we propose semantic typology for VGPs, aiming to elucidate the VGP phenomena and deepen the understanding about how human beings interpret vision with language. We construct a large VGP dataset that annotates the class to which each VGP pair belongs according to our typology. In addition, we present a classification model that fuses language and visual features for VGP classification on our dataset. Experiments indicate that joint language and vision representation learning is important for VGP classification. We further demonstrate that our VGP typology can boost the performance of visually grounded textual entailment.	en
dc.language.iso	eng	-
dc.publisher	Elsevier	en
dc.rights	© 2021 The Author(s). Published by Elsevier Inc.	en
dc.rights	This is an open access article under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	-
dc.subject	Vision and language	en
dc.subject	Image interpretation	en
dc.subject	Visual grounded paraphrases	en
dc.subject	Semantic typology	en
dc.subject	Dataset	en
dc.title	The Semantic Typology of Visually Grounded Paraphrases	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	Computer Vision and Image Understanding	en
dc.identifier.volume	215	-
dc.relation.doi	10.1016/j.cviu.2021.103333	-
dc.textversion	publisher	-
dc.identifier.artnum	103333	-
dcterms.accessRights	open access	-
datacite.awardNumber	18H03264	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-18H03264/	-
dc.identifier.pissn	1077-3142	-
jpcoar.funderName	日本学術振興会	ja
jpcoar.awardTitle	知識ベースを活用した視覚情報に関する質疑応答システムの実現	ja
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス