Loading…

SLMFNet: Enhancing land cover classification of remote sensing images through selective attentions and multi-level feature fusion

Land cover classification (LCC) is of paramount importance for assessing environmental changes in remote sensing images (RSIs) as it involves assigning categorical labels to ground objects. The growing availability of multi-source RSIs presents an opportunity for intelligent LCC through semantic seg...

Full description

Saved in:
Bibliographic Details
Published in:PloS one 2024-05, Vol.19 (5), p.e0301134-e0301134
Main Authors: Li, Xin, Zhao, Hejing, Wu, Dan, Liu, Qixing, Tang, Rui, Li, Linyang, Xu, Zhennan, Lyu, Xin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Land cover classification (LCC) is of paramount importance for assessing environmental changes in remote sensing images (RSIs) as it involves assigning categorical labels to ground objects. The growing availability of multi-source RSIs presents an opportunity for intelligent LCC through semantic segmentation, offering a comprehensive understanding of ground objects. Nonetheless, the heterogeneous appearances of terrains and objects contribute to significant intra-class variance and inter-class similarity at various scales, adding complexity to this task. In response, we introduce SLMFNet, an innovative encoder-decoder segmentation network that adeptly addresses this challenge. To mitigate the sparse and imbalanced distribution of RSIs, we incorporate selective attention modules (SAMs) aimed at enhancing the distinguishability of learned representations by integrating contextual affinities within spatial and channel domains through a compact number of matrix operations. Precisely, the selective position attention module (SPAM) employs spatial pyramid pooling (SPP) to resample feature anchors and compute contextual affinities. In tandem, the selective channel attention module (SCAM) concentrates on capturing channel-wise affinity. Initially, feature maps are aggregated into fewer channels, followed by the generation of pairwise channel attention maps between the aggregated channels and all channels. To harness fine-grained details across multiple scales, we introduce a multi-level feature fusion decoder with data-dependent upsampling (MLFD) to meticulously recover and merge feature maps at diverse scales using a trainable projection matrix. Empirical results on the ISPRS Potsdam and DeepGlobe datasets underscore the superior performance of SLMFNet compared to various state-of-the-art methods. Ablation studies affirm the efficacy and precision of SAMs in the proposed model.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0301134