Loading…
P-Swin: Parallel Swin transformer multi-scale semantic segmentation network for land cover classification
With the recent development of remote sensing technology and deep learning, semantic segmentation methods have been increasingly used in land cover classification. However, this method is faced with the challenge of incomplete recognition caused by big differences in scale of ground objects. Owing t...
Saved in:
Published in: | Computers & geosciences 2023-06, Vol.175, p.105340, Article 105340 |
---|---|
Main Authors: | , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | With the recent development of remote sensing technology and deep learning, semantic segmentation methods have been increasingly used in land cover classification. However, this method is faced with the challenge of incomplete recognition caused by big differences in scale of ground objects. Owing to multi-head self-attention, the Swin Transformer Network (Swin) has a large receptive field at its shallow level, which is conducive to the identification of large-scale objects. However, Swin does not fully mine the context information of features, so it is easy to cause incomplete recognition. Based on Swin, we propose a parallel window-based Transformer Network, Parallel Swin Transformer Network (P-Swin). The core of P-Swin is a Parallel Swin Transformer Block (PST Block), which includes Window-based Self Attention Interaction (WSAI) and Feed Forward Network (FFN). WSAI can not only calculate the relationship within windows, but also establish the relationship between windows. Therefore, it improves the ability of network to obtain feature context information. P-Swin outperformed Swin and reached the highest level, with 76.42% mIoU for the test set in the ISPRS Potsdam 2D dataset (Swin: 75.95%), 65.13% mIoU for the test set in the Gaofen Image Dataset (Swin: 63.41%), and 64.61% mIoU for the test set in the WHDLD Dataset (Swin: 63.01%)
•A self-attention block is proposed for mining context information and semantic information.•P-Swin obtains the context information of features to achieve more complete results.•P-Swin can improve the integrity of the segmentation results of multi-scale objects.•P-Swin performs well on three datasets and outperforms Swin.•P-Swin is a reliable semantic segmentation method for RS images. |
---|---|
ISSN: | 0098-3004 1873-7803 |
DOI: | 10.1016/j.cageo.2023.105340 |