Loading…

Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images

This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE geoscience and remote sensing letters 2023, Vol.20, p.1-5
Main Authors:	Long, Jiang, Li, Mengmeng, Wang, Xiaoqin
Format:	Article
Language:	English
Subjects:	Artificial neural networks Auxiliary supervise Blurring Buildings CLCFormer Coders Convolution Convolutional neural networks convolutional neural networks (CNNs) Datasets Feature extraction High resolution Image processing Image resolution Image segmentation Learning Methods Modules Neural networks Optimization Remote sensing Resolution Semantic segmentation Semantics Spatial discrimination learning Tiles transformer Transformers very high-resolution (VHR) images
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3
cites	cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3
container_end_page	5
container_issue
container_start_page	1
container_title	IEEE geoscience and remote sensing letters
container_volume	20
creator	Long, Jiang Li, Mengmeng Wang, Xiaoqin
description	This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .
doi_str_mv	10.1109/LGRS.2023.3262586
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_LGRS_2023_3262586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10083109</ieee_id><sourcerecordid>2797296785</sourcerecordid><originalsourceid>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</originalsourceid><addsrcrecordid>eNpNkF9LwzAUxYsoOKcfQPAh4HNnkjZN8ihTt8FAaP33VtL2tutok5lk4MAPb-t88OkcLr9zD5wguCZ4RgiWd-tFms0optEsogllIjkJJoQxEWLGyenoYxYyKT7OgwvnthjTWAg-Cb5X2kNjlW91g7LdoKpDD-BV2zn03voNWhvdhKnSDaC5GeAv71BtLMqgV9q35WCaHrQfokYjU6M3sAe0bJtNmIIz3f73nkJvPIQZaDc2rXrVgLsMzmrVObj602nw-vT4Ml-G6-fFan6_DksqYx9CRYUqipJzDhGrSVQlNa2IUJSruCpjiauiIorgsiJMKlJzJoBQJmkJhSyraBrcHv_urPncg_P51uytHipzyiWnMuGCDRQ5UqU1zlmo851te2UPOcH5OHI-jpyPI-d_Iw-Zm2OmBYB_PBbREIh-AFi0ev4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2797296785</pqid></control><display><type>article</type><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><source>IEEE Xplore (Online service)</source><creator>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</creator><creatorcontrib>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</creatorcontrib><description>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</description><identifier>ISSN: 1545-598X</identifier><identifier>EISSN: 1558-0571</identifier><identifier>DOI: 10.1109/LGRS.2023.3262586</identifier><identifier>CODEN: IGRSBY</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Auxiliary supervise ; Blurring ; Buildings ; CLCFormer ; Coders ; Convolution ; Convolutional neural networks ; convolutional neural networks (CNNs) ; Datasets ; Feature extraction ; High resolution ; Image processing ; Image resolution ; Image segmentation ; Learning ; Methods ; Modules ; Neural networks ; Optimization ; Remote sensing ; Resolution ; Semantic segmentation ; Semantics ; Spatial discrimination learning ; Tiles ; transformer ; Transformers ; very high-resolution (VHR) images</subject><ispartof>IEEE geoscience and remote sensing letters, 2023, Vol.20, p.1-5</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</citedby><cites>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</cites><orcidid>0000-0003-1969-0980 ; 0000-0002-9083-0475 ; 0000-0002-9619-1918</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10083109$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4009,27902,27903,27904,54775</link.rule.ids></links><search><creatorcontrib>Long, Jiang</creatorcontrib><creatorcontrib>Li, Mengmeng</creatorcontrib><creatorcontrib>Wang, Xiaoqin</creatorcontrib><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><title>IEEE geoscience and remote sensing letters</title><addtitle>LGRS</addtitle><description>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</description><subject>Artificial neural networks</subject><subject>Auxiliary supervise</subject><subject>Blurring</subject><subject>Buildings</subject><subject>CLCFormer</subject><subject>Coders</subject><subject>Convolution</subject><subject>Convolutional neural networks</subject><subject>convolutional neural networks (CNNs)</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>High resolution</subject><subject>Image processing</subject><subject>Image resolution</subject><subject>Image segmentation</subject><subject>Learning</subject><subject>Methods</subject><subject>Modules</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Remote sensing</subject><subject>Resolution</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Spatial discrimination learning</subject><subject>Tiles</subject><subject>transformer</subject><subject>Transformers</subject><subject>very high-resolution (VHR) images</subject><issn>1545-598X</issn><issn>1558-0571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkF9LwzAUxYsoOKcfQPAh4HNnkjZN8ihTt8FAaP33VtL2tutok5lk4MAPb-t88OkcLr9zD5wguCZ4RgiWd-tFms0optEsogllIjkJJoQxEWLGyenoYxYyKT7OgwvnthjTWAg-Cb5X2kNjlW91g7LdoKpDD-BV2zn03voNWhvdhKnSDaC5GeAv71BtLMqgV9q35WCaHrQfokYjU6M3sAe0bJtNmIIz3f73nkJvPIQZaDc2rXrVgLsMzmrVObj602nw-vT4Ml-G6-fFan6_DksqYx9CRYUqipJzDhGrSVQlNa2IUJSruCpjiauiIorgsiJMKlJzJoBQJmkJhSyraBrcHv_urPncg_P51uytHipzyiWnMuGCDRQ5UqU1zlmo851te2UPOcH5OHI-jpyPI-d_Iw-Zm2OmBYB_PBbREIh-AFi0ev4</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Long, Jiang</creator><creator>Li, Mengmeng</creator><creator>Wang, Xiaoqin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TG</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1969-0980</orcidid><orcidid>https://orcid.org/0000-0002-9083-0475</orcidid><orcidid>https://orcid.org/0000-0002-9619-1918</orcidid></search><sort><creationdate>2023</creationdate><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><author>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Auxiliary supervise</topic><topic>Blurring</topic><topic>Buildings</topic><topic>CLCFormer</topic><topic>Coders</topic><topic>Convolution</topic><topic>Convolutional neural networks</topic><topic>convolutional neural networks (CNNs)</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>High resolution</topic><topic>Image processing</topic><topic>Image resolution</topic><topic>Image segmentation</topic><topic>Learning</topic><topic>Methods</topic><topic>Modules</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Remote sensing</topic><topic>Resolution</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Spatial discrimination learning</topic><topic>Tiles</topic><topic>transformer</topic><topic>Transformers</topic><topic>very high-resolution (VHR) images</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Long, Jiang</creatorcontrib><creatorcontrib>Li, Mengmeng</creatorcontrib><creatorcontrib>Wang, Xiaoqin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE geoscience and remote sensing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Long, Jiang</au><au>Li, Mengmeng</au><au>Wang, Xiaoqin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</atitle><jtitle>IEEE geoscience and remote sensing letters</jtitle><stitle>LGRS</stitle><date>2023</date><risdate>2023</risdate><volume>20</volume><spage>1</spage><epage>5</epage><pages>1-5</pages><issn>1545-598X</issn><eissn>1558-0571</eissn><coden>IGRSBY</coden><abstract>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LGRS.2023.3262586</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-1969-0980</orcidid><orcidid>https://orcid.org/0000-0002-9083-0475</orcidid><orcidid>https://orcid.org/0000-0002-9619-1918</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1545-598X
ispartof	IEEE geoscience and remote sensing letters, 2023, Vol.20, p.1-5
issn	1545-598X 1558-0571
language	eng
recordid	cdi_crossref_primary_10_1109_LGRS_2023_3262586
source	IEEE Xplore (Online service)
subjects	Artificial neural networks Auxiliary supervise Blurring Buildings CLCFormer Coders Convolution Convolutional neural networks convolutional neural networks (CNNs) Datasets Feature extraction High resolution Image processing Image resolution Image segmentation Learning Methods Modules Neural networks Optimization Remote sensing Resolution Semantic segmentation Semantics Spatial discrimination learning Tiles transformer Transformers very high-resolution (VHR) images
title	Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T12%3A39%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integrating%20Spatial%20Details%20With%20Long-Range%20Contexts%20for%20Semantic%20Segmentation%20of%20Very%20High-Resolution%20Remote-Sensing%20Images&rft.jtitle=IEEE%20geoscience%20and%20remote%20sensing%20letters&rft.au=Long,%20Jiang&rft.date=2023&rft.volume=20&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.issn=1545-598X&rft.eissn=1558-0571&rft.coden=IGRSBY&rft_id=info:doi/10.1109/LGRS.2023.3262586&rft_dat=%3Cproquest_cross%3E2797296785%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2797296785&rft_id=info:pmid/&rft_ieee_id=10083109&rfr_iscdi=true