Loading…

CGNN: Caption-assisted graph neural network for image-text retrieval

•We propose a Caption-Assisted Graph Neural Network (CGNN) for image-text matching.•We generate image captions as auxiliary information to release the domain gap.•Experiments on Flickr30K and MS-COCO show the effectiveness of our framework. Image-text retrieval has drawn much attention in recent yea...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition letters 2022-09, Vol.161, p.137-142
Main Authors:	Hu, Yongli, Zhang, Hanfu, Jiang, Huajie, Bi, Yandong, Yin, Baocai
Format:	Article
Language:	English
Subjects:	Cross-modal retrieval Graph convolution Image captioning Image-text retrieval
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•We propose a Caption-Assisted Graph Neural Network (CGNN) for image-text matching.•We generate image captions as auxiliary information to release the domain gap.•Experiments on Flickr30K and MS-COCO show the effectiveness of our framework. Image-text retrieval has drawn much attention in recent years, where similarity measure between image and text plays an important role. Most existing works focus on learning global coarse-grained or local fine-grained features for similarity computation. However, the large domain gap between different modalities is often neglected, which makes it difficult to match the images and texts effectively. In order to deal with this problem, we propose to use auxiliary information to release the domain gap, where the image captions are generated. Then, a Caption-Assisted Graph Neural Network(CGNN) is designed to learn the structured relationships among images, captions, and texts. Since the captions and the texts are from the same domain, the domain gap between images and texts can be effectively released. With the help of caption information, our model achieves excellent performance on two cross-modal retrieval datasets, Flickr30K and MS-COCO, which shows the effectiveness of our framework.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2022.08.002