Loading…

Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning

The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content and sentence and fail to exploit special characteristics of the remote sensing image...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on geoscience and remote sensing 2022, Vol.60, p.1-16
Main Authors: Li, Yunpeng, Zhang, Xiangrong, Gu, Jing, Li, Chen, Wang, Xin, Tang, Xu, Jiao, Licheng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content and sentence and fail to exploit special characteristics of the remote sensing images. We introduce a novel recurrent attention and semantic gate (RASG) framework to facilitate the remote sensing image captioning in this article, which integrates competitive visual features and a recurrent attention mechanism to generate a better context vector for the images every time as well as enhances the representations of the current word state. Specifically, we first project each image into competitive visual features by taking the advantage of both static visual features and multiscale features. Then, a novel recurrent attention mechanism is developed to extract the high-level attentive maps from encoded features and nonvisual features, which can help the decoder recognize and focus on the effective information for understanding the complex content of the remote sensing images. Finally, the hidden states from the long short-term memory (LSTM) and other semantic references are incorporated into a semantic gate, which contributes to more comprehensive and precise semantic understanding. Comprehensive experiments on three widely used datasets, Sydney-Captions, UCM-Captions, and Remote Sensing Image Captioning Dataset, have demonstrated the superiority of the proposed RASG over a series of attentive models based on image captioning methods.
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2021.3102590