Loading…

CoT: Contourlet Transformer for Hierarchical Semantic Segmentation

The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous de...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems 2024-02, Vol.PP, p.1-15
Main Authors:	Shao, Yilin, Sun, Long, Jiao, Licheng, Liu, Xu, Liu, Fang, Li, Lingling, Yang, Shuyuan
Format:	Article
Language:	English
Subjects:	Computed tomography Contourlet transform (CT) Convolutional neural networks Feature extraction Semantic segmentation Semantics sparse convolution Task analysis Transformers Transformer–convolutional neural network (CNN) hybrid model
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	15
container_issue
container_start_page	1
container_title	IEEE transaction on neural networks and learning systems
container_volume	PP
creator	Shao, Yilin Sun, Long Jiao, Licheng Liu, Xu Liu, Fang Li, Lingling Yang, Shuyuan
description	The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory-and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context 57.21% mean intersection over union (mIoU), ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.
doi_str_mv	10.1109/TNNLS.2024.3367901
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNNLS_2024_3367901</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10445018</ieee_id><sourcerecordid>2932435572</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-392b1efc24e007917b70dfb0c5a8ade39d686628a23abe119eb0c43015daf2703</originalsourceid><addsrcrecordid>eNpNkMtOwzAURC0EolXpDyCEsmST4ldimx1EvKSqLBokdpbj3EBQEhc7WfD3pLRU3M2MdGdmcRA6J3hBCFbX-Wq1XC8opnzBWCoUJkdoSklKY8qkPD548TZB8xA-8XgpTlKuTtGESY4lJmSK7jKX30SZ63o3-Ab6KPemC5XzLfholOipBm-8_aitaaI1tKbrazua9xa63vS1687QSWWaAPO9ztDrw32ePcXLl8fn7HYZWyqSPmaKFgQqSzlgLBQRhcBlVWCbGGlKYKpMZZpSaSgzBRCiYPxxhklSmooKzGboare78e5rgNDrtg4WmsZ04IagqWKUsyQRdIzSXdR6F4KHSm983Rr_rQnWW3z6F5_e4tN7fGPpcr8_FC2Uh8ofrDFwsQvUAPBvkfMEE8l-AA1Xc6E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932435572</pqid></control><display><type>article</type><title>CoT: Contourlet Transformer for Hierarchical Semantic Segmentation</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Shao, Yilin ; Sun, Long ; Jiao, Licheng ; Liu, Xu ; Liu, Fang ; Li, Lingling ; Yang, Shuyuan</creator><creatorcontrib>Shao, Yilin ; Sun, Long ; Jiao, Licheng ; Liu, Xu ; Liu, Fang ; Li, Lingling ; Yang, Shuyuan</creatorcontrib><description>The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory-and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context 57.21% mean intersection over union (mIoU), ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2024.3367901</identifier><identifier>PMID: 38408011</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Computed tomography ; Contourlet transform (CT) ; Convolutional neural networks ; Feature extraction ; Semantic segmentation ; Semantics ; sparse convolution ; Task analysis ; Transformers ; Transformer–convolutional neural network (CNN) hybrid model</subject><ispartof>IEEE transaction on neural networks and learning systems, 2024-02, Vol.PP, p.1-15</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-7446-9336 ; 0000-0002-6130-2518 ; 0000-0002-4796-5737 ; 0000-0002-8780-5455 ; 0000-0003-3354-9617 ; 0000-0002-5312-4686 ; 0000-0002-5669-9354</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10445018$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38408011$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shao, Yilin</creatorcontrib><creatorcontrib>Sun, Long</creatorcontrib><creatorcontrib>Jiao, Licheng</creatorcontrib><creatorcontrib>Liu, Xu</creatorcontrib><creatorcontrib>Liu, Fang</creatorcontrib><creatorcontrib>Li, Lingling</creatorcontrib><creatorcontrib>Yang, Shuyuan</creatorcontrib><title>CoT: Contourlet Transformer for Hierarchical Semantic Segmentation</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory-and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context 57.21% mean intersection over union (mIoU), ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.</description><subject>Computed tomography</subject><subject>Contourlet transform (CT)</subject><subject>Convolutional neural networks</subject><subject>Feature extraction</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>sparse convolution</subject><subject>Task analysis</subject><subject>Transformers</subject><subject>Transformer–convolutional neural network (CNN) hybrid model</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkMtOwzAURC0EolXpDyCEsmST4ldimx1EvKSqLBokdpbj3EBQEhc7WfD3pLRU3M2MdGdmcRA6J3hBCFbX-Wq1XC8opnzBWCoUJkdoSklKY8qkPD548TZB8xA-8XgpTlKuTtGESY4lJmSK7jKX30SZ63o3-Ab6KPemC5XzLfholOipBm-8_aitaaI1tKbrazua9xa63vS1687QSWWaAPO9ztDrw32ePcXLl8fn7HYZWyqSPmaKFgQqSzlgLBQRhcBlVWCbGGlKYKpMZZpSaSgzBRCiYPxxhklSmooKzGboare78e5rgNDrtg4WmsZ04IagqWKUsyQRdIzSXdR6F4KHSm983Rr_rQnWW3z6F5_e4tN7fGPpcr8_FC2Uh8ofrDFwsQvUAPBvkfMEE8l-AA1Xc6E</recordid><startdate>20240226</startdate><enddate>20240226</enddate><creator>Shao, Yilin</creator><creator>Sun, Long</creator><creator>Jiao, Licheng</creator><creator>Liu, Xu</creator><creator>Liu, Fang</creator><creator>Li, Lingling</creator><creator>Yang, Shuyuan</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-7446-9336</orcidid><orcidid>https://orcid.org/0000-0002-6130-2518</orcidid><orcidid>https://orcid.org/0000-0002-4796-5737</orcidid><orcidid>https://orcid.org/0000-0002-8780-5455</orcidid><orcidid>https://orcid.org/0000-0003-3354-9617</orcidid><orcidid>https://orcid.org/0000-0002-5312-4686</orcidid><orcidid>https://orcid.org/0000-0002-5669-9354</orcidid></search><sort><creationdate>20240226</creationdate><title>CoT: Contourlet Transformer for Hierarchical Semantic Segmentation</title><author>Shao, Yilin ; Sun, Long ; Jiao, Licheng ; Liu, Xu ; Liu, Fang ; Li, Lingling ; Yang, Shuyuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-392b1efc24e007917b70dfb0c5a8ade39d686628a23abe119eb0c43015daf2703</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computed tomography</topic><topic>Contourlet transform (CT)</topic><topic>Convolutional neural networks</topic><topic>Feature extraction</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>sparse convolution</topic><topic>Task analysis</topic><topic>Transformers</topic><topic>Transformer–convolutional neural network (CNN) hybrid model</topic><toplevel>online_resources</toplevel><creatorcontrib>Shao, Yilin</creatorcontrib><creatorcontrib>Sun, Long</creatorcontrib><creatorcontrib>Jiao, Licheng</creatorcontrib><creatorcontrib>Liu, Xu</creatorcontrib><creatorcontrib>Liu, Fang</creatorcontrib><creatorcontrib>Li, Lingling</creatorcontrib><creatorcontrib>Yang, Shuyuan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shao, Yilin</au><au>Sun, Long</au><au>Jiao, Licheng</au><au>Liu, Xu</au><au>Liu, Fang</au><au>Li, Lingling</au><au>Yang, Shuyuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CoT: Contourlet Transformer for Hierarchical Semantic Segmentation</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2024-02-26</date><risdate>2024</risdate><volume>PP</volume><spage>1</spage><epage>15</epage><pages>1-15</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>The Transformer-convolutional neural network (CNN) hybrid learning approach is gaining traction for balancing deep and shallow image features for hierarchical semantic segmentation. However, they are still confronted with a contradiction between comprehensive semantic understanding and meticulous detail extraction. To solve this problem, this article proposes a novel Transformer-CNN hybrid hierarchical network, dubbed contourlet transformer (CoT). In the CoT framework, the semantic representation process of the Transformer is unavoidably peppered with sparsely distributed points that, while not desired, demand finer detail. Therefore, we design a deep detail representation (DDR) structure to investigate their fine-grained features. First, through contourlet transform (CT), we distill the high-frequency directional components from the raw image, yielding localized features that accommodate the inductive bias of CNN. Second, a CNN deep sparse learning (DSL) module takes them as input to represent the underlying detailed features. This memory-and energy-efficient learning method can keep the same sparse pattern between input and output. Finally, the decoder hierarchically fuses the detailed features with the semantic features via an image reconstruction-like fashion. Experiments demonstrate that CoT achieves competitive performance on three benchmark datasets: PASCAL Context 57.21% mean intersection over union (mIoU), ADE20K (54.16% mIoU), and Cityscapes (84.23% mIoU). Furthermore, we conducted robustness studies to validate its resistance against various sorts of corruption. Our code is available at: https://github.com/yilinshao/CoT-Contourlet-Transformer.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38408011</pmid><doi>10.1109/TNNLS.2024.3367901</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-7446-9336</orcidid><orcidid>https://orcid.org/0000-0002-6130-2518</orcidid><orcidid>https://orcid.org/0000-0002-4796-5737</orcidid><orcidid>https://orcid.org/0000-0002-8780-5455</orcidid><orcidid>https://orcid.org/0000-0003-3354-9617</orcidid><orcidid>https://orcid.org/0000-0002-5312-4686</orcidid><orcidid>https://orcid.org/0000-0002-5669-9354</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2162-237X
ispartof	IEEE transaction on neural networks and learning systems, 2024-02, Vol.PP, p.1-15
issn	2162-237X 2162-2388
language	eng
recordid	cdi_crossref_primary_10_1109_TNNLS_2024_3367901
source	IEEE Electronic Library (IEL) Journals
subjects	Computed tomography Contourlet transform (CT) Convolutional neural networks Feature extraction Semantic segmentation Semantics sparse convolution Task analysis Transformers Transformer–convolutional neural network (CNN) hybrid model
title	CoT: Contourlet Transformer for Hierarchical Semantic Segmentation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T11%3A14%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CoT:%20Contourlet%20Transformer%20for%20Hierarchical%20Semantic%20Segmentation&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Shao,%20Yilin&rft.date=2024-02-26&rft.volume=PP&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2024.3367901&rft_dat=%3Cproquest_cross%3E2932435572%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c275t-392b1efc24e007917b70dfb0c5a8ade39d686628a23abe119eb0c43015daf2703%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2932435572&rft_id=info:pmid/38408011&rft_ieee_id=10445018&rfr_iscdi=true