Loading…

Co-attention graph convolutional network for visual question answering

Visual Question Answering (VQA) is a challenging task that requires a fine-grained understanding of both the visual content of images and the textual content of questions. Conventional visual attention model, which is designed primarily from the perspective of attention mechanism, lacks the ability...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia systems 2023-10, Vol.29 (5), p.2527-2543
Main Authors: Liu, Chuan, Tan, Ying-Ying, Xia, Tian-Tian, Zhang, Jiajing, Zhu, Ming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visual Question Answering (VQA) is a challenging task that requires a fine-grained understanding of both the visual content of images and the textual content of questions. Conventional visual attention model, which is designed primarily from the perspective of attention mechanism, lacks the ability to reason about relationships between visual objects and ignores the multimodal interactions between questions and images. In this work, we propose a combined both graph convolutional network and co-attention network to circumvent the aforementioned problem. The model employs binary relational reasoning as the graph learner module to learn a graph structure representation that captures relationships between visual objects and learns image representation related to the specific question that has an awareness of spatial location via spatial graph convolution. After that, we perform parallel co-attention learning by passing image representations and features of question words through a deep co-attention module. Experiment results demonstrate that the Overall accuracy of our model delivers 68.67 % on the test-std set of the benchmark VQA v2.0 dataset, which outperforms most existing models.
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-023-01125-7