Loading…

A Siamese Network Based on Multiple Attention and Multilayer Transformers for Change Detection

Deep learning (DL) networks have demonstrated promising performance in high-resolution remote sensing (RS) image change detection (CD). The transformer can enhance the features and capture the global semantic relations, which has been used to solve the CD problem for high-resolution remote sensing i...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on geoscience and remote sensing 2023, Vol.61, p.1-15
Main Authors: Tang, Wenjie, Wu, Ke, Zhang, Yuxiang, Zhan, Yanting
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep learning (DL) networks have demonstrated promising performance in high-resolution remote sensing (RS) image change detection (CD). The transformer can enhance the features and capture the global semantic relations, which has been used to solve the CD problem for high-resolution remote sensing images with good results. However, the depth of the transformer is limited, and the extracted features are not representative, which makes the performance of the CD model unsatisfied. To fix this problem, we propose a Siamese network based on multiple attention and multilayer transformers (SMARTs) for CD in this article. It is a Siamese network containing three different modules, which can process bitemporal images in parallel and extract enhanced features at different levels. The first is the feature extraction module. It expresses the features as a certain number of high-order semantic features through the spatial attention module (SPAM), followed by the calculation of the semantic relations between these high-order semantic features using the transformer encoder, which greatly improves the computational efficiency. The second is the feature enhancement module. It computes global semantic relations with a self-attention module (SFAM). The multilayer encoder gets the enhanced features at different levels by computing the relationship between features at each layer. The multilayer decoder refines the bitemporal features of each layer and projects them back to the original space. The third is the fusion module. It uses the ensemble channel attention module (ECAM) to elaborate the feature differences at different levels. The proposed SMART model has been compared with some state-of-the-art CD methods in three publicly available datasets. The results confirm that SMART outperforms state-of-the-art CD methods on several evaluation metrics. Our code is available at https://github.com/TwJ-IGG/SMART
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2023.3325220