Loading…
A Siamese Network Based on Multiple Attention and Multilayer Transformers for Change Detection
Deep learning (DL) networks have demonstrated promising performance in high-resolution remote sensing (RS) image change detection (CD). The transformer can enhance the features and capture the global semantic relations, which has been used to solve the CD problem for high-resolution remote sensing i...
Saved in:
Published in: | IEEE transactions on geoscience and remote sensing 2023, Vol.61, p.1-15 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Deep learning (DL) networks have demonstrated promising performance in high-resolution remote sensing (RS) image change detection (CD). The transformer can enhance the features and capture the global semantic relations, which has been used to solve the CD problem for high-resolution remote sensing images with good results. However, the depth of the transformer is limited, and the extracted features are not representative, which makes the performance of the CD model unsatisfied. To fix this problem, we propose a Siamese network based on multiple attention and multilayer transformers (SMARTs) for CD in this article. It is a Siamese network containing three different modules, which can process bitemporal images in parallel and extract enhanced features at different levels. The first is the feature extraction module. It expresses the features as a certain number of high-order semantic features through the spatial attention module (SPAM), followed by the calculation of the semantic relations between these high-order semantic features using the transformer encoder, which greatly improves the computational efficiency. The second is the feature enhancement module. It computes global semantic relations with a self-attention module (SFAM). The multilayer encoder gets the enhanced features at different levels by computing the relationship between features at each layer. The multilayer decoder refines the bitemporal features of each layer and projects them back to the original space. The third is the fusion module. It uses the ensemble channel attention module (ECAM) to elaborate the feature differences at different levels. The proposed SMART model has been compared with some state-of-the-art CD methods in three publicly available datasets. The results confirm that SMART outperforms state-of-the-art CD methods on several evaluation metrics. Our code is available at https://github.com/TwJ-IGG/SMART |
---|---|
ISSN: | 0196-2892 1558-0644 |
DOI: | 10.1109/TGRS.2023.3325220 |