Loading…
Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
The rapid development of remote sensing (RS) technology has produced massive images, which makes it difficult to obtain interpretation results by manual screening. Therefore, researchers began to develop automatic retrieval method of RS images. In recent years, cross-modal RS image retrieval based o...
Saved in:
Published in: | IEEE journal of selected topics in applied earth observations and remote sensing 2023, Vol.16, p.812-824 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The rapid development of remote sensing (RS) technology has produced massive images, which makes it difficult to obtain interpretation results by manual screening. Therefore, researchers began to develop automatic retrieval method of RS images. In recent years, cross-modal RS image retrieval based on query text has attracted many researchers because of its flexible and has become a new research trend. However, the primary problem faced is that the information of query text and RS image is not aligned. For example, RS images often have the attributes of multiscale and multiobjective, and the amount of information is rich, while the query text contains only a few words, and the information is scarce. Recently, graph neural network (GNN) has shown its potential in many tasks with its powerful feature representation ability. Therefore, based on GNN, this article proposes a new cross-modal RS feature matching network, which can avoid the degradation of retrieval performance caused by information misalignment by learning the feature interaction in query text and RS image, respectively, and modeling the feature association between the two modes. Specifically, to fuse the within-modal features, the text and RS image graph modules are designed based on GNN. In addition, in order to effectively match the query text and RS image, combined with the multihead attention mechanism, an image-text association module is constructed to focus on the parts related to RS image in the text. The experiments on two public standard datasets verify the competitive performance of the proposed model. |
---|---|
ISSN: | 1939-1404 2151-1535 |
DOI: | 10.1109/JSTARS.2022.3231851 |