Loading…

Sign-based image criteria for social interaction visual question answering

The multi-modal tasks have started to play a significant role in the research on artificial intelligence. A particular example of that domain is visual–linguistic tasks, such as visual question answering. The progress of modern machine learning systems is determined, among other things, by the data...

Full description

Saved in:
Bibliographic Details
Published in:Logic journal of the IGPL 2024-08, Vol.32 (4), p.656-670
Main Authors: Chuganskaya, Anfisa A, Kovalev, Alexey K, Panov, Aleksandr I
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The multi-modal tasks have started to play a significant role in the research on artificial intelligence. A particular example of that domain is visual–linguistic tasks, such as visual question answering. The progress of modern machine learning systems is determined, among other things, by the data on which these systems are trained. Most modern visual question answering data sets contain limited type questions that can be answered either by directly accessing the image itself or by using external data. At the same time, insufficient attention is paid to the issues of social interactions between people, which limits the scope of visual question answering systems. In this paper, we propose criteria by which images suitable for social interaction visual question answering can be selected for composing such questions, based on psychological research. We believe this should serve the progress of visual question answering systems.
ISSN:1367-0751
1368-9894
DOI:10.1093/jigpal/jzae026