Loading…

VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering

•The proposed model is a free form, open ended and knowledge aware VQA model.•VQA modeled as an explainable, end to end factoid question answering problem.•Model capable of leveraging granular details, correlate inter-related details in scenes.•Model capable of leveraging external world knowledge to...

Full description

Saved in:
Bibliographic Details
Published in:Image and vision computing 2021-12, Vol.116, p.104328, Article 104328
Main Authors: Narayanan, Abhishek, Rao, Abijna, Prasad, Abhishek, S, Natarajan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•The proposed model is a free form, open ended and knowledge aware VQA model.•VQA modeled as an explainable, end to end factoid question answering problem.•Model capable of leveraging granular details, correlate inter-related details in scenes.•Model capable of leveraging external world knowledge to answer questions.•Model capable of predicting likely explanations to justify the predicted answers. With recent advancements in machine perception and scene understanding, Visual Question Answering (VQA) has garnered much attraction from researchers in the direction of training neural models for jointly analyzing, grounding and reasoning over the multi-modal space of image visual context and natural language in order to answer natural language questions pertaining to the image contents. However, though recent works have achieved significant improvement over state-of-art models for answering questions that are answerable by solely referring to the visual context of the image, such models are often limited, being incapable of tackling questions involving external world knowledge beyond the visible contents. Though recently, research has been driven towards tackling external knowledge based VQA as well, there is significant room for improvement as limited studies exist in this area. Inspired by the aforementioned challenges involved, this paper is aimed at answering free form and open ended natural language questions, not limited to visual context of an image, but external world knowledge as well. With this motive, inspired by human cognitive abilities of comprehending and reasoning answers when given a set of facts, this paper proposes a novel model architecture to model VQA as a factoid question answering problem, leveraging state-of-the-art deep learning techniques for reasoning and inferring answers to free form questions, in an attempt of improving the state-of-art in open ended visual question answering.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2021.104328