Loading…

Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images

Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges includ...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal on miniaturization for air and space systems 2024-03, Vol.5 (1), p.33-41
Main Authors:	Zhou, Xuanyu, Zhou, Lifan, Gong, Shengrong, Zhang, Haizhen, Zhong, Shan, Xia, Yu, Huang, Yizhou
Format:	Article
Language:	English
Subjects:	Artificial neural networks Autonomous aerial vehicles Coders Complexity Convolutional neural networks Feature extraction Image segmentation Modules Remote sensing Semantic segmentation Semantics Swin transformer Technical services Transformers unmanned aerial vehicle (UAV) Unmanned aerial vehicles
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c1987-b70f018d724311d8effbc270da30e0db25eb1cf2f26231a5418428a90b25b7ed3
container_end_page	41
container_issue	1
container_start_page	33
container_title	IEEE journal on miniaturization for air and space systems
container_volume	5
creator	Zhou, Xuanyu Zhou, Lifan Gong, Shengrong Zhang, Haizhen Zhong, Shan Xia, Yu Huang, Yizhou
description	Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder-decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.
doi_str_mv	10.1109/JMASS.2023.3332948
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10319338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10319338</ieee_id><sourcerecordid>2930961525</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1987-b70f018d724311d8effbc270da30e0db25eb1cf2f26231a5418428a90b25b7ed3</originalsourceid><addsrcrecordid>eNpNkE9LAzEQxYMoWGq_gHgIeN6aTPZP9liK2kqtYFvBU8juTspWN6nJFum3d2t76Gne8N6bgR8ht5wNOWf5w8vraLEYAgMxFEJAHssL0oMkSyPB0_jyTF-TQQgbxhiwWGYSeuRzsi98XdHxfE61rejSaxuM8w16Osf21_kv2q10gY22bV12Yt2gbXVbO0udoavRB33HxrXYWTbUdk2njV5juCFXRn8HHJxmn6yeHpfjSTR7e56OR7Oo5LnMoiJjhnFZZRALziuJxhQlZKzSgiGrCkiw4KUBAykIrpOYyxikzlnnFBlWok_uj3e33v3sMLRq43bedi8V5ILlKU8g6VJwTJXeheDRqK2vG-33ijN1oKj-KaoDRXWi2JXujqUaEc8KgudCSPEHsb1tDg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2930961525</pqid></control><display><type>article</type><title>Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images</title><source>IEEE Xplore (Online service)</source><creator>Zhou, Xuanyu ; Zhou, Lifan ; Gong, Shengrong ; Zhang, Haizhen ; Zhong, Shan ; Xia, Yu ; Huang, Yizhou</creator><creatorcontrib>Zhou, Xuanyu ; Zhou, Lifan ; Gong, Shengrong ; Zhang, Haizhen ; Zhong, Shan ; Xia, Yu ; Huang, Yizhou</creatorcontrib><description>Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder-decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.</description><identifier>ISSN: 2576-3164</identifier><identifier>EISSN: 2576-3164</identifier><identifier>DOI: 10.1109/JMASS.2023.3332948</identifier><identifier>CODEN: IJMAJI</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Autonomous aerial vehicles ; Coders ; Complexity ; Convolutional neural networks ; Feature extraction ; Image segmentation ; Modules ; Remote sensing ; Semantic segmentation ; Semantics ; Swin transformer ; Technical services ; Transformers ; unmanned aerial vehicle (UAV) ; Unmanned aerial vehicles</subject><ispartof>IEEE journal on miniaturization for air and space systems, 2024-03, Vol.5 (1), p.33-41</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1987-b70f018d724311d8effbc270da30e0db25eb1cf2f26231a5418428a90b25b7ed3</cites><orcidid>0000-0003-0266-2422 ; 0000-0003-0034-6952 ; 0009-0002-0633-694X ; 0000-0001-7665-413X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10319338$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,54777</link.rule.ids></links><search><creatorcontrib>Zhou, Xuanyu</creatorcontrib><creatorcontrib>Zhou, Lifan</creatorcontrib><creatorcontrib>Gong, Shengrong</creatorcontrib><creatorcontrib>Zhang, Haizhen</creatorcontrib><creatorcontrib>Zhong, Shan</creatorcontrib><creatorcontrib>Xia, Yu</creatorcontrib><creatorcontrib>Huang, Yizhou</creatorcontrib><title>Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images</title><title>IEEE journal on miniaturization for air and space systems</title><addtitle>JMASS</addtitle><description>Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder-decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.</description><subject>Artificial neural networks</subject><subject>Autonomous aerial vehicles</subject><subject>Coders</subject><subject>Complexity</subject><subject>Convolutional neural networks</subject><subject>Feature extraction</subject><subject>Image segmentation</subject><subject>Modules</subject><subject>Remote sensing</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Swin transformer</subject><subject>Technical services</subject><subject>Transformers</subject><subject>unmanned aerial vehicle (UAV)</subject><subject>Unmanned aerial vehicles</subject><issn>2576-3164</issn><issn>2576-3164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE9LAzEQxYMoWGq_gHgIeN6aTPZP9liK2kqtYFvBU8juTspWN6nJFum3d2t76Gne8N6bgR8ht5wNOWf5w8vraLEYAgMxFEJAHssL0oMkSyPB0_jyTF-TQQgbxhiwWGYSeuRzsi98XdHxfE61rejSaxuM8w16Osf21_kv2q10gY22bV12Yt2gbXVbO0udoavRB33HxrXYWTbUdk2njV5juCFXRn8HHJxmn6yeHpfjSTR7e56OR7Oo5LnMoiJjhnFZZRALziuJxhQlZKzSgiGrCkiw4KUBAykIrpOYyxikzlnnFBlWok_uj3e33v3sMLRq43bedi8V5ILlKU8g6VJwTJXeheDRqK2vG-33ijN1oKj-KaoDRXWi2JXujqUaEc8KgudCSPEHsb1tDg</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Zhou, Xuanyu</creator><creator>Zhou, Lifan</creator><creator>Gong, Shengrong</creator><creator>Zhang, Haizhen</creator><creator>Zhong, Shan</creator><creator>Xia, Yu</creator><creator>Huang, Yizhou</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0266-2422</orcidid><orcidid>https://orcid.org/0000-0003-0034-6952</orcidid><orcidid>https://orcid.org/0009-0002-0633-694X</orcidid><orcidid>https://orcid.org/0000-0001-7665-413X</orcidid></search><sort><creationdate>20240301</creationdate><title>Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images</title><author>Zhou, Xuanyu ; Zhou, Lifan ; Gong, Shengrong ; Zhang, Haizhen ; Zhong, Shan ; Xia, Yu ; Huang, Yizhou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1987-b70f018d724311d8effbc270da30e0db25eb1cf2f26231a5418428a90b25b7ed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Autonomous aerial vehicles</topic><topic>Coders</topic><topic>Complexity</topic><topic>Convolutional neural networks</topic><topic>Feature extraction</topic><topic>Image segmentation</topic><topic>Modules</topic><topic>Remote sensing</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Swin transformer</topic><topic>Technical services</topic><topic>Transformers</topic><topic>unmanned aerial vehicle (UAV)</topic><topic>Unmanned aerial vehicles</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Xuanyu</creatorcontrib><creatorcontrib>Zhou, Lifan</creatorcontrib><creatorcontrib>Gong, Shengrong</creatorcontrib><creatorcontrib>Zhang, Haizhen</creatorcontrib><creatorcontrib>Zhong, Shan</creatorcontrib><creatorcontrib>Xia, Yu</creatorcontrib><creatorcontrib>Huang, Yizhou</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE journal on miniaturization for air and space systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Xuanyu</au><au>Zhou, Lifan</au><au>Gong, Shengrong</au><au>Zhang, Haizhen</au><au>Zhong, Shan</au><au>Xia, Yu</au><au>Huang, Yizhou</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images</atitle><jtitle>IEEE journal on miniaturization for air and space systems</jtitle><stitle>JMASS</stitle><date>2024-03-01</date><risdate>2024</risdate><volume>5</volume><issue>1</issue><spage>33</spage><epage>41</epage><pages>33-41</pages><issn>2576-3164</issn><eissn>2576-3164</eissn><coden>IJMAJI</coden><abstract>Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder-decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/JMASS.2023.3332948</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-0266-2422</orcidid><orcidid>https://orcid.org/0000-0003-0034-6952</orcidid><orcidid>https://orcid.org/0009-0002-0633-694X</orcidid><orcidid>https://orcid.org/0000-0001-7665-413X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2576-3164
ispartof	IEEE journal on miniaturization for air and space systems, 2024-03, Vol.5 (1), p.33-41
issn	2576-3164 2576-3164
language	eng
recordid	cdi_ieee_primary_10319338
source	IEEE Xplore (Online service)
subjects	Artificial neural networks Autonomous aerial vehicles Coders Complexity Convolutional neural networks Feature extraction Image segmentation Modules Remote sensing Semantic segmentation Semantics Swin transformer Technical services Transformers unmanned aerial vehicle (UAV) Unmanned aerial vehicles
title	Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T15%3A17%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hybrid%20CNN%20and%20Transformer%20Network%20for%20Semantic%20Segmentation%20of%20UAV%20Remote%20Sensing%20Images&rft.jtitle=IEEE%20journal%20on%20miniaturization%20for%20air%20and%20space%20systems&rft.au=Zhou,%20Xuanyu&rft.date=2024-03-01&rft.volume=5&rft.issue=1&rft.spage=33&rft.epage=41&rft.pages=33-41&rft.issn=2576-3164&rft.eissn=2576-3164&rft.coden=IJMAJI&rft_id=info:doi/10.1109/JMASS.2023.3332948&rft_dat=%3Cproquest_ieee_%3E2930961525%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c1987-b70f018d724311d8effbc270da30e0db25eb1cf2f26231a5418428a90b25b7ed3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2930961525&rft_id=info:pmid/&rft_ieee_id=10319338&rfr_iscdi=true