Loading…
Merging Patches and Tokens: A VQA System for Remote Sensing
In this paper, we investigate the integration of transformer-based feature extractors in a Remote Sensing Visual Question Answering (RSVQA) framework. Our findings demonstrate an improvement to the baseline achieved through additional attention modules after feature extraction and using MUTAN (Multi...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we investigate the integration of transformer-based feature extractors in a Remote Sensing Visual Question Answering (RSVQA) framework. Our findings demonstrate an improvement to the baseline achieved through additional attention modules after feature extraction and using MUTAN (Multimodal Tucker Fusion). Further, we delve into the potential of multi-task learning, observing a considerable boost in performance when feature extractors are trained. Our results suggest a promising future research avenue in multitask learning for RSVQA, while also emphasizing the need for careful selection of hyperparameters per question type as well as finding the proper balance for training the shared backbone and individual classifiers simultaneously to further improve performance. |
---|---|
ISSN: | 2153-7003 |
DOI: | 10.1109/IGARSS53475.2024.10641975 |