Loading…
Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images
This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically...
Saved in:
Published in: | IEEE geoscience and remote sensing letters 2023, Vol.20, p.1-5 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3 |
---|---|
cites | cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3 |
container_end_page | 5 |
container_issue | |
container_start_page | 1 |
container_title | IEEE geoscience and remote sensing letters |
container_volume | 20 |
creator | Long, Jiang Li, Mengmeng Wang, Xiaoqin |
description | This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer . |
doi_str_mv | 10.1109/LGRS.2023.3262586 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_LGRS_2023_3262586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10083109</ieee_id><sourcerecordid>2797296785</sourcerecordid><originalsourceid>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</originalsourceid><addsrcrecordid>eNpNkF9LwzAUxYsoOKcfQPAh4HNnkjZN8ihTt8FAaP33VtL2tutok5lk4MAPb-t88OkcLr9zD5wguCZ4RgiWd-tFms0optEsogllIjkJJoQxEWLGyenoYxYyKT7OgwvnthjTWAg-Cb5X2kNjlW91g7LdoKpDD-BV2zn03voNWhvdhKnSDaC5GeAv71BtLMqgV9q35WCaHrQfokYjU6M3sAe0bJtNmIIz3f73nkJvPIQZaDc2rXrVgLsMzmrVObj602nw-vT4Ml-G6-fFan6_DksqYx9CRYUqipJzDhGrSVQlNa2IUJSruCpjiauiIorgsiJMKlJzJoBQJmkJhSyraBrcHv_urPncg_P51uytHipzyiWnMuGCDRQ5UqU1zlmo851te2UPOcH5OHI-jpyPI-d_Iw-Zm2OmBYB_PBbREIh-AFi0ev4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2797296785</pqid></control><display><type>article</type><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><source>IEEE Xplore (Online service)</source><creator>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</creator><creatorcontrib>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</creatorcontrib><description>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</description><identifier>ISSN: 1545-598X</identifier><identifier>EISSN: 1558-0571</identifier><identifier>DOI: 10.1109/LGRS.2023.3262586</identifier><identifier>CODEN: IGRSBY</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Auxiliary supervise ; Blurring ; Buildings ; CLCFormer ; Coders ; Convolution ; Convolutional neural networks ; convolutional neural networks (CNNs) ; Datasets ; Feature extraction ; High resolution ; Image processing ; Image resolution ; Image segmentation ; Learning ; Methods ; Modules ; Neural networks ; Optimization ; Remote sensing ; Resolution ; Semantic segmentation ; Semantics ; Spatial discrimination learning ; Tiles ; transformer ; Transformers ; very high-resolution (VHR) images</subject><ispartof>IEEE geoscience and remote sensing letters, 2023, Vol.20, p.1-5</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</citedby><cites>FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</cites><orcidid>0000-0003-1969-0980 ; 0000-0002-9083-0475 ; 0000-0002-9619-1918</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10083109$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4009,27902,27903,27904,54775</link.rule.ids></links><search><creatorcontrib>Long, Jiang</creatorcontrib><creatorcontrib>Li, Mengmeng</creatorcontrib><creatorcontrib>Wang, Xiaoqin</creatorcontrib><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><title>IEEE geoscience and remote sensing letters</title><addtitle>LGRS</addtitle><description>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</description><subject>Artificial neural networks</subject><subject>Auxiliary supervise</subject><subject>Blurring</subject><subject>Buildings</subject><subject>CLCFormer</subject><subject>Coders</subject><subject>Convolution</subject><subject>Convolutional neural networks</subject><subject>convolutional neural networks (CNNs)</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>High resolution</subject><subject>Image processing</subject><subject>Image resolution</subject><subject>Image segmentation</subject><subject>Learning</subject><subject>Methods</subject><subject>Modules</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Remote sensing</subject><subject>Resolution</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Spatial discrimination learning</subject><subject>Tiles</subject><subject>transformer</subject><subject>Transformers</subject><subject>very high-resolution (VHR) images</subject><issn>1545-598X</issn><issn>1558-0571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkF9LwzAUxYsoOKcfQPAh4HNnkjZN8ihTt8FAaP33VtL2tutok5lk4MAPb-t88OkcLr9zD5wguCZ4RgiWd-tFms0optEsogllIjkJJoQxEWLGyenoYxYyKT7OgwvnthjTWAg-Cb5X2kNjlW91g7LdoKpDD-BV2zn03voNWhvdhKnSDaC5GeAv71BtLMqgV9q35WCaHrQfokYjU6M3sAe0bJtNmIIz3f73nkJvPIQZaDc2rXrVgLsMzmrVObj602nw-vT4Ml-G6-fFan6_DksqYx9CRYUqipJzDhGrSVQlNa2IUJSruCpjiauiIorgsiJMKlJzJoBQJmkJhSyraBrcHv_urPncg_P51uytHipzyiWnMuGCDRQ5UqU1zlmo851te2UPOcH5OHI-jpyPI-d_Iw-Zm2OmBYB_PBbREIh-AFi0ev4</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Long, Jiang</creator><creator>Li, Mengmeng</creator><creator>Wang, Xiaoqin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TG</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1969-0980</orcidid><orcidid>https://orcid.org/0000-0002-9083-0475</orcidid><orcidid>https://orcid.org/0000-0002-9619-1918</orcidid></search><sort><creationdate>2023</creationdate><title>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</title><author>Long, Jiang ; Li, Mengmeng ; Wang, Xiaoqin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Auxiliary supervise</topic><topic>Blurring</topic><topic>Buildings</topic><topic>CLCFormer</topic><topic>Coders</topic><topic>Convolution</topic><topic>Convolutional neural networks</topic><topic>convolutional neural networks (CNNs)</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>High resolution</topic><topic>Image processing</topic><topic>Image resolution</topic><topic>Image segmentation</topic><topic>Learning</topic><topic>Methods</topic><topic>Modules</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Remote sensing</topic><topic>Resolution</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Spatial discrimination learning</topic><topic>Tiles</topic><topic>transformer</topic><topic>Transformers</topic><topic>very high-resolution (VHR) images</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Long, Jiang</creatorcontrib><creatorcontrib>Li, Mengmeng</creatorcontrib><creatorcontrib>Wang, Xiaoqin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE geoscience and remote sensing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Long, Jiang</au><au>Li, Mengmeng</au><au>Wang, Xiaoqin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images</atitle><jtitle>IEEE geoscience and remote sensing letters</jtitle><stitle>LGRS</stitle><date>2023</date><risdate>2023</risdate><volume>20</volume><spage>1</spage><epage>5</epage><pages>1-5</pages><issn>1545-598X</issn><eissn>1558-0571</eissn><coden>IGRSBY</coden><abstract>This letter presents a cross-learning network (i.e., CLCFormer) integrating fine-grained spatial details within long-range global contexts based upon convolutional neural networks (CNNs) and transformer, for semantic segmentation of very high-resolution (VHR) remote-sensing images. More specifically, CLCFormer comprises two parallel encoders, derived from the CNN and transformer, and a CNN decoder. The encoders are backboned on SwinV2 and EfficientNet-B3, from which the extracted semantic features are aggregated at multiple levels using a bilateral feature fusion module (BiFFM). First, we used attention gate (ATG) modules to enhance feature representation, improving segmentation results for objects with various shapes and sizes. Second, we used an attention residual (ATR) module to refine spatial features's learning, alleviating boundary blurring of occluded objects. Finally, we developed a new strategy, called auxiliary supervise strategy (ASS), for model optimization to further improve segmentation performance. Our method was tested on the WHU, Inria, and Potsdam datasets, and compared with CNN-based and transformer-based methods. Results showed that our method achieved state-of-the-art performance on the WHU building dataset (92.31% IoU), Inria building dataset (83.71% IoU), and Potsdam dataset (80.27% MIoU). We concluded that CLCFormer is a flexible, robust, and effective method for the semantic segmentation of VHR images. The codes of the proposed model are available at https://github.com/long123524/CLCFormer .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LGRS.2023.3262586</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-1969-0980</orcidid><orcidid>https://orcid.org/0000-0002-9083-0475</orcidid><orcidid>https://orcid.org/0000-0002-9619-1918</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1545-598X |
ispartof | IEEE geoscience and remote sensing letters, 2023, Vol.20, p.1-5 |
issn | 1545-598X 1558-0571 |
language | eng |
recordid | cdi_crossref_primary_10_1109_LGRS_2023_3262586 |
source | IEEE Xplore (Online service) |
subjects | Artificial neural networks Auxiliary supervise Blurring Buildings CLCFormer Coders Convolution Convolutional neural networks convolutional neural networks (CNNs) Datasets Feature extraction High resolution Image processing Image resolution Image segmentation Learning Methods Modules Neural networks Optimization Remote sensing Resolution Semantic segmentation Semantics Spatial discrimination learning Tiles transformer Transformers very high-resolution (VHR) images |
title | Integrating Spatial Details With Long-Range Contexts for Semantic Segmentation of Very High-Resolution Remote-Sensing Images |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T12%3A39%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integrating%20Spatial%20Details%20With%20Long-Range%20Contexts%20for%20Semantic%20Segmentation%20of%20Very%20High-Resolution%20Remote-Sensing%20Images&rft.jtitle=IEEE%20geoscience%20and%20remote%20sensing%20letters&rft.au=Long,%20Jiang&rft.date=2023&rft.volume=20&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.issn=1545-598X&rft.eissn=1558-0571&rft.coden=IGRSBY&rft_id=info:doi/10.1109/LGRS.2023.3262586&rft_dat=%3Cproquest_cross%3E2797296785%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c294t-ed28abbc777e35f13d6f2d18a27a4dc490dbd1a10cd159a1f758e12592ceb9cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2797296785&rft_id=info:pmid/&rft_ieee_id=10083109&rfr_iscdi=true |