Loading…
From plane to hierarchy: Deformable Transformer for Remote Sensing Image Captioning
With the growth of remote sensing images, un-derstanding image content automatically has attracted many researchers' interests in deep learning for remote sensing image. Inspired from the natural image captioning, the model with CNN-RNN as the backbone and supplemented by attention has been wid...
Saved in:
Published in: | IEEE journal of selected topics in applied earth observations and remote sensing 2023-01, Vol.16, p.1-14 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | With the growth of remote sensing images, un-derstanding image content automatically has attracted many researchers' interests in deep learning for remote sensing image. Inspired from the natural image captioning, the model with CNN-RNN as the backbone and supplemented by attention has been widely used in remote sensing image captioning. However, it is inefficient for the current attention layer to simultaneously mine hidden foreground from the background of remote sensing image and perform feature interactive learning. Meanwhile, the new mainstream language model has recently surpassed the traditional LSTM in sentence generation. For solving the above problems, in this paper, we proposed a novel thought to make the flat remote sensing images stereoscopic by separating the fore- and background. Based on hierarchical image informa-tion, we designed a novel Deformable Transformer equipped with deformable scaled dot-product attention to learn multi-scale feature from fore- and background through the powerful interactive learning ability. Evaluations are conducted on Four classic remote sensing image captioning datasets. Compared with the state-of-the-art methods, our Transformer variant achieves higher captioning accuracy. |
---|---|
ISSN: | 1939-1404 2151-1535 |
DOI: | 10.1109/JSTARS.2023.3305889 |