Loading…

Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images

This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE geoscience and remote sensing letters 2023, Vol.20, p.1-5
Main Authors:	Long, Jiang, Li, Mengmeng, Wang, Xiaoqin
Format:	Article
Language:	English
Subjects:	Artificial neural networks Auxiliary supervise Blurring Buildings CLCFormer Coders Convolution Convolutional neural networks convolutional neural networks (CNNs) Datasets Feature extraction High resolution Image processing Image resolution Image segmentation Learning Methods Modules Neural networks Optimization Remote sensing Resolution Semantic segmentation Semantics Spatial discrimination learning Tiles transformer Transformers very high-resolution (VHR) images
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .
ISSN:	1545-598X 1558-0571
DOI:	10.1109/LGRS.2023.3262586