Loading…

CrossVG: Visual Grounding in Remote Sensing with Modality-Guided Interactions

Visual grounding aims to use a natural language expression to find specific objects in an image, whether in a bounding box or a segmentation mask. The vision research community has extensively investigated the objective. Nevertheless, the existing benchmark datasets and methodologies predominantly e...

Full description

Saved in:
Bibliographic Details
Main Authors: Choudhury, Shabnam, Kurkure, Pratham, Talwar, Priyanka, Banerjee, Biplab
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visual grounding aims to use a natural language expression to find specific objects in an image, whether in a bounding box or a segmentation mask. The vision research community has extensively investigated the objective. Nevertheless, the existing benchmark datasets and methodologies predominantly emphasize natural images rather than remote sensing images. Remote sensing images differ from realistic images in that they encompass expansive scenes and provide geographical spatial information about ground objects. Current approaches address this issue by extending the fundamental object detection framework. Yet, the effectiveness of visual feature models based on these predetermined locations may be limited since they only partially leverage the visual context and attribute information offered by the text query. This restricts the platitudinous interaction in the visual-linguistic setting. To circumvent this constraint, we suggest a novel architecture called CrossVG based on visual and language-guided cross-modality interactions to build multi-modal correspondence. Through experiments, we demonstrate that a simple stack of transformer encoder layers can substitute complex fusion modules with better-performing alternatives. We validate the efficacy of our suggested model and exhibit SOTA performance using the benchmark dataset RSVGD.
ISSN:2153-7003
DOI:10.1109/IGARSS53475.2024.10642183