Loading…

Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image

Extracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentatio...

Full description

Saved in:
Bibliographic Details
Published in:ISPRS international journal of geo-information 2022-03, Vol.11 (3), p.165
Main Authors: Sun, Zhongyu, Zhou, Wangping, Ding, Chen, Xia, Min
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Extracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentation, it has a limitation on the receptive field of high-resolution remote sensing images, which means that it can not show the long-distance scene well during pixel classification, and the image features is compressed during down-sampling, meaning that the detailed information is lost. In order to address these issues, Hybrid Multi-resolution and Transformer semantic extraction Network (HMRT) is proposed in this paper, by which a global receptive field for each pixel can be provided, a small receptive field of convolutional neural networks (CNN) can be overcome, and the ability of scene understanding can be enhanced well. Firstly, we blend the features by branches of different resolutions to keep the high-resolution and multi-resolution during down-sampling and fully retain feature information. Secondly, we introduce the Transformer sequence feature extraction network and use encoding and decoding to realize that each pixel has the global receptive field. The recall, F1, OA and MIoU of HMPR obtain 85.32%, 84.88%, 85.99% and 74.19%, respectively, in the main experiment and reach 91.29%, 90.41%, 91.32% and 84.00%, respectively, in the generalization experiment, which prove that the method proposed is better than existing methods.
ISSN:2220-9964
2220-9964
DOI:10.3390/ijgi11030165