Loading…

P-Swin: Parallel Swin transformer multi-scale semantic segmentation network for land cover classification

With the recent development of remote sensing technology and deep learning, semantic segmentation methods have been increasingly used in land cover classification. However, this method is faced with the challenge of incomplete recognition caused by big differences in scale of ground objects. Owing t...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & geosciences 2023-06, Vol.175, p.105340, Article 105340
Main Authors:	Wang, Di, Yang, Ronghao, Zhang, Zhenxin, Liu, Hanhu, Tan, Junxiang, Li, Shaoda, Yang, Xiaoxia, Wang, Xiao, Tang, Kangqi, Qiao, Yichun, Su, Po
Format:	Article
Language:	English
Subjects:	Context information Land cover classification Multi-scale semantic segmentation Parallel Swin transformer
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With the recent development of remote sensing technology and deep learning, semantic segmentation methods have been increasingly used in land cover classification. However, this method is faced with the challenge of incomplete recognition caused by big differences in scale of ground objects. Owing to multi-head self-attention, the Swin Transformer Network (Swin) has a large receptive field at its shallow level, which is conducive to the identification of large-scale objects. However, Swin does not fully mine the context information of features, so it is easy to cause incomplete recognition. Based on Swin, we propose a parallel window-based Transformer Network, Parallel Swin Transformer Network (P-Swin). The core of P-Swin is a Parallel Swin Transformer Block (PST Block), which includes Window-based Self Attention Interaction (WSAI) and Feed Forward Network (FFN). WSAI can not only calculate the relationship within windows, but also establish the relationship between windows. Therefore, it improves the ability of network to obtain feature context information. P-Swin outperformed Swin and reached the highest level, with 76.42% mIoU for the test set in the ISPRS Potsdam 2D dataset (Swin: 75.95%), 65.13% mIoU for the test set in the Gaofen Image Dataset (Swin: 63.41%), and 64.61% mIoU for the test set in the WHDLD Dataset (Swin: 63.01%) •A self-attention block is proposed for mining context information and semantic information.•P-Swin obtains the context information of features to achieve more complete results.•P-Swin can improve the integrity of the segmentation results of multi-scale objects.•P-Swin performs well on three datasets and outperforms Swin.•P-Swin is a reliable semantic segmentation method for RS images.
ISSN:	0098-3004 1873-7803
DOI:	10.1016/j.cageo.2023.105340