Loading…

Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images

This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically...

Full description

Saved in:
Bibliographic Details
Published in:IEEE geoscience and remote sensing letters 2023, Vol.20, p.1-5
Main Authors: Long, Jiang, Li, Mengmeng, Wang, Xiaoqin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3
cites cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3
container_end_page 5
container_issue
container_start_page 1
container_title IEEE geoscience and remote sensing letters
container_volume 20
creator Long, Jiang
Li, Mengmeng
Wang, Xiaoqin
description This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .
doi_str_mv 10.1109/LGRS.2023.3262586
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_LGRS_2023_3262586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10083109</ieee_id><sourcerecordid>2797296785</sourcerecordid><originalsourceid>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</originalsourceid><addsrcrecordid>eNpNkF9LwzAUxYsoOKcfQPAh4HNnkjZN8ihTt8FAaP33VtL2tutok5lk4MAPb-t88OkcLr9zD5wguCZ4RgiWd-tFms0optEsogllIjkJJoQxEWLGyenoYxYyKT7OgwvnthjTWAg-Cb5X2kNjlW91g7LdoKpDD-BV2zn03voNWhvdhKnSDaC5GeAv71BtLMqgV9q35WCaHrQfokYjU6M3sAe0bJtNmIIz3f73nkJvPIQZaDc2rXrVgLsMzmrVObj602nw-vT4Ml-G6-fFan6_DksqYx9CRYUqipJzDhGrSVQlNa2IUJSruCpjiauiIorgsiJMKlJzJoBQJmkJhSyraBrcHv_urPncg_P51uytHipzyiWnMuGCDRQ5UqU1zlmo851te2UPOcH5OHI-jpyPI-d_Iw-Zm2OmBYB_PBbREIh-AFi0ev4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2797296785</pqid></control><display><type>article</type><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><source>IEEE Xplore (Online service)</source><creator>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</creator><creatorcontrib>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</creatorcontrib><description>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</description><identifier>ISSN: 1545-598X</identifier><identifier>EISSN: 1558-0571</identifier><identifier>DOI: 10.1109/LGRS.2023.3262586</identifier><identifier>CODEN: IGRSBY</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Auxiliary supervise ; Blurring ; Buildings ; CLCFormer ; Coders ; Convolution ; Convolutional neural networks ; convolutional neural networks (CNNs) ; Datasets ; Feature extraction ; High resolution ; Image processing ; Image resolution ; Image segmentation ; Learning ; Methods ; Modules ; Neural networks ; Optimization ; Remote sensing ; Resolution ; Semantic segmentation ; Semantics ; Spatial discrimination learning ; Tiles ; transformer ; Transformers ; very high-resolution (VHR) images</subject><ispartof>IEEE geoscience and remote sensing letters, 2023, Vol.20, p.1-5</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</citedby><cites>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</cites><orcidid>0000-0003-1969-0980 ; 0000-0002-9083-0475 ; 0000-0002-9619-1918</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10083109$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4009,27902,27903,27904,54775</link.rule.ids></links><search><creatorcontrib>Long, Jiang</creatorcontrib><creatorcontrib>Li, Mengmeng</creatorcontrib><creatorcontrib>Wang, Xiaoqin</creatorcontrib><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><title>IEEE geoscience and remote sensing letters</title><addtitle>LGRS</addtitle><description>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</description><subject>Artificial neural networks</subject><subject>Auxiliary supervise</subject><subject>Blurring</subject><subject>Buildings</subject><subject>CLCFormer</subject><subject>Coders</subject><subject>Convolution</subject><subject>Convolutional neural networks</subject><subject>convolutional neural networks (CNNs)</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>High resolution</subject><subject>Image processing</subject><subject>Image resolution</subject><subject>Image segmentation</subject><subject>Learning</subject><subject>Methods</subject><subject>Modules</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Remote sensing</subject><subject>Resolution</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Spatial discrimination learning</subject><subject>Tiles</subject><subject>transformer</subject><subject>Transformers</subject><subject>very high-resolution (VHR) images</subject><issn>1545-598X</issn><issn>1558-0571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkF9LwzAUxYsoOKcfQPAh4HNnkjZN8ihTt8FAaP33VtL2tutok5lk4MAPb-t88OkcLr9zD5wguCZ4RgiWd-tFms0optEsogllIjkJJoQxEWLGyenoYxYyKT7OgwvnthjTWAg-Cb5X2kNjlW91g7LdoKpDD-BV2zn03voNWhvdhKnSDaC5GeAv71BtLMqgV9q35WCaHrQfokYjU6M3sAe0bJtNmIIz3f73nkJvPIQZaDc2rXrVgLsMzmrVObj602nw-vT4Ml-G6-fFan6_DksqYx9CRYUqipJzDhGrSVQlNa2IUJSruCpjiauiIorgsiJMKlJzJoBQJmkJhSyraBrcHv_urPncg_P51uytHipzyiWnMuGCDRQ5UqU1zlmo851te2UPOcH5OHI-jpyPI-d_Iw-Zm2OmBYB_PBbREIh-AFi0ev4</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Long, Jiang</creator><creator>Li, Mengmeng</creator><creator>Wang, Xiaoqin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TG</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1969-0980</orcidid><orcidid>https://orcid.org/0000-0002-9083-0475</orcidid><orcidid>https://orcid.org/0000-0002-9619-1918</orcidid></search><sort><creationdate>2023</creationdate><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><author>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Auxiliary supervise</topic><topic>Blurring</topic><topic>Buildings</topic><topic>CLCFormer</topic><topic>Coders</topic><topic>Convolution</topic><topic>Convolutional neural networks</topic><topic>convolutional neural networks (CNNs)</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>High resolution</topic><topic>Image processing</topic><topic>Image resolution</topic><topic>Image segmentation</topic><topic>Learning</topic><topic>Methods</topic><topic>Modules</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Remote sensing</topic><topic>Resolution</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Spatial discrimination learning</topic><topic>Tiles</topic><topic>transformer</topic><topic>Transformers</topic><topic>very high-resolution (VHR) images</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Long, Jiang</creatorcontrib><creatorcontrib>Li, Mengmeng</creatorcontrib><creatorcontrib>Wang, Xiaoqin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy &amp; Non-Living Resources</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE geoscience and remote sensing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Long, Jiang</au><au>Li, Mengmeng</au><au>Wang, Xiaoqin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</atitle><jtitle>IEEE geoscience and remote sensing letters</jtitle><stitle>LGRS</stitle><date>2023</date><risdate>2023</risdate><volume>20</volume><spage>1</spage><epage>5</epage><pages>1-5</pages><issn>1545-598X</issn><eissn>1558-0571</eissn><coden>IGRSBY</coden><abstract>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LGRS.2023.3262586</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-1969-0980</orcidid><orcidid>https://orcid.org/0000-0002-9083-0475</orcidid><orcidid>https://orcid.org/0000-0002-9619-1918</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1545-598X
ispartof IEEE geoscience and remote sensing letters, 2023, Vol.20, p.1-5
issn 1545-598X
1558-0571
language eng
recordid cdi_crossref_primary_10_1109_LGRS_2023_3262586
source IEEE Xplore (Online service)
subjects Artificial neural networks
Auxiliary supervise
Blurring
Buildings
CLCFormer
Coders
Convolution
Convolutional neural networks
convolutional neural networks (CNNs)
Datasets
Feature extraction
High resolution
Image processing
Image resolution
Image segmentation
Learning
Methods
Modules
Neural networks
Optimization
Remote sensing
Resolution
Semantic segmentation
Semantics
Spatial discrimination learning
Tiles
transformer
Transformers
very high-resolution (VHR) images
title Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T12%3A39%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integrating%20Spatial%20Details%20With%20Long-Range%20Contexts%20for%20Semantic%20Segmentation%20of%20Very%20High-Resolution%20Remote-Sensing%20Images&rft.jtitle=IEEE%20geoscience%20and%20remote%20sensing%20letters&rft.au=Long,%20Jiang&rft.date=2023&rft.volume=20&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.issn=1545-598X&rft.eissn=1558-0571&rft.coden=IGRSBY&rft_id=info:doi/10.1109/LGRS.2023.3262586&rft_dat=%3Cproquest_cross%3E2797296785%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2797296785&rft_id=info:pmid/&rft_ieee_id=10083109&rfr_iscdi=true