Loading…
Adaptive enhanced swin transformer with U-net for remote sensing image segmentation
•In this paper, a UAV remote sensing segmentation method based on CNN and transformer is proposed. On the basis of the U-Net structure, we introduce a CNN transformer hybrid encoder and a symmetrical CNN decoder, which can effectively extract and utilize the global and local semantic information for...
Saved in:
Published in: | Computers & electrical engineering 2022-09, Vol.102, p.108223, Article 108223 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •In this paper, a UAV remote sensing segmentation method based on CNN and transformer is proposed. On the basis of the U-Net structure, we introduce a CNN transformer hybrid encoder and a symmetrical CNN decoder, which can effectively extract and utilize the global and local semantic information for obtaining the segmentation image accurately and reduce the calculation of transformer. Besides, we construct an adaptive multiscale transformer module and strengthen the multi-head self-attention in it for boosting the performance of AESwin-UNet. Experimental results on two UAV remote sensing datasets show that our AESwin-UNet has excellent performance. Our contributions can be summarized as:.•Based on hybrid CNN-Transformer, a U-shaped encoder-decoder model with skip connections is proposed, which realizes pixel-level segmentation prediction by fusing local and global feature, while reducing the scale of pre-training.•An enhanced swin transformer block with an attention module is constructed, which enhances the extraction of the effective features by reducing the redundancy in MHSA.•A deformable adaptive patch merging layer is proposed to assign appropriate receptive fields to different targets while achieving down-sampling.
Semantic segmentation of remote sensing images often faces complex situations, such as variable scale objects, large intra-class differences, and imbalanced distribution among classes. Convolutional Neural Network (CNN) based models have been widely used in remote sensing image segmentation tasks for its powerful feature extraction capability. Due to intrinsic locality of CNN architectures, it is difficult to understand the long-range dependencies among image patches. Recently, the transformer leverages long-range dependencies and performs well in computer vision tasks. To take advantages of both CNN and Transformer, a novel Adaptive Enhanced Swin Transformer with U-Net (AESwin-UNet) is proposed for remote sensing segmentation. AESwin-UNet uses a hybrid Transformer-based U-type Encoder-Decoder architecture with skip connections to extract local and global semantic features. Specifically, the Enhanced Swin Transformer (E-Swin Transformer) contains Enhanced Multi-head Self-Attention and Deformable Adaptive Patch Merging layer in encoder. A symmetric cascaded decoder is designed for up-sampling to obtain higher resolution feature maps. Experiments on two public benchmark datasets, WHDLD and LoveDA, demonstrate that the proposed AESwin-UNet pe |
---|---|
ISSN: | 0045-7906 1879-0755 |
DOI: | 10.1016/j.compeleceng.2022.108223 |