Loading…
IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection
Infrared small target detection occupies an important position in the infrared search and track system. The most common size of infrared images has developed to 640×512. The field-of-view (FOV) also increases significantly. As the result, there is more interference that hinders the detection of smal...
Saved in:
Published in: | Remote sensing (Basel, Switzerland) Switzerland), 2022-07, Vol.14 (14), p.3258 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c291t-3f777160aa809a46b3c064b73fd65f0e5b4aca1edecf2c885604d4e2fd2c50223 |
---|---|
cites | cdi_FETCH-LOGICAL-c291t-3f777160aa809a46b3c064b73fd65f0e5b4aca1edecf2c885604d4e2fd2c50223 |
container_end_page | |
container_issue | 14 |
container_start_page | 3258 |
container_title | Remote sensing (Basel, Switzerland) |
container_volume | 14 |
creator | Chen, Gao Wang, Weihua Tan, Sirui |
description | Infrared small target detection occupies an important position in the infrared search and track system. The most common size of infrared images has developed to 640×512. The field-of-view (FOV) also increases significantly. As the result, there is more interference that hinders the detection of small targets in the image. However, the traditional model-driven methods do not have the capability of feature learning, resulting in poor adaptability to various scenes. Owing to the locality of convolution kernels, recent convolutional neural networks (CNN) cannot model the long-range dependency in the image to suppress false alarms. In this paper, we propose a hierarchical vision transformer-based method for infrared small target detection in larger size and FOV images of 640×512. Specifically, we design a hierarchical overlapped small patch transformer (HOSPT), instead of the CNN, to encode multi-scale features from the single-frame image. For the decoder, a top-down feature aggregation module (TFAM) is adopted to fuse features from adjacent scales. Furthermore, after analyzing existing loss functions, a simple yet effective combination is exploited to optimize the network convergence. Compared to other state-of-the-art methods, the normalized intersection-over-union (nIoU) on our IRST640 dataset and public SIRST dataset reaches 0.856 and 0.758. The detailed ablation experiments are conducted to validate the effectiveness and reasonability of each component in the method. |
doi_str_mv | 10.3390/rs14143258 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_f573306f2efa4a908ea041f12f892b7c</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_f573306f2efa4a908ea041f12f892b7c</doaj_id><sourcerecordid>2694059958</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-3f777160aa809a46b3c064b73fd65f0e5b4aca1edecf2c885604d4e2fd2c50223</originalsourceid><addsrcrecordid>eNpNkU9Lw0AQxRdRsNRe_AQL3oTo_k2y3kq1NlAQbfS6TDa7NSXN1t304Lc3tqLO5Q3DjzcPHkKXlNxwrshtiFRQwZnMT9CIkYwlgil2-m8_R5MYN2QYzqkiYoSei5dVOfdha8MdnuJFYwME894YaPFbExvf4TJAF90BwYPgonMDY2u82kLb4hLC2vb43vbW9AN_gc4ctNFOfnSMXucP5WyRLJ8ei9l0mRimaJ9wl2UZTQlAThSItOKGpKLKuKtT6YiVlQAD1NbWOGbyXKZE1MIyVzMjCWN8jIqjb-1ho3eh2UL41B4afTj4sNYQ-sa0VjuZcU5Sx6wDAYrkFoigjjKXK1ZlZvC6Onrtgv_Y29jrjd-HboivWaoEkUrJfKCuj5QJPsZg3e9XSvR3A_qvAf4F7V53FQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2694059958</pqid></control><display><type>article</type><title>IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection</title><source>Publicly Available Content Database</source><creator>Chen, Gao ; Wang, Weihua ; Tan, Sirui</creator><creatorcontrib>Chen, Gao ; Wang, Weihua ; Tan, Sirui</creatorcontrib><description>Infrared small target detection occupies an important position in the infrared search and track system. The most common size of infrared images has developed to 640×512. The field-of-view (FOV) also increases significantly. As the result, there is more interference that hinders the detection of small targets in the image. However, the traditional model-driven methods do not have the capability of feature learning, resulting in poor adaptability to various scenes. Owing to the locality of convolution kernels, recent convolutional neural networks (CNN) cannot model the long-range dependency in the image to suppress false alarms. In this paper, we propose a hierarchical vision transformer-based method for infrared small target detection in larger size and FOV images of 640×512. Specifically, we design a hierarchical overlapped small patch transformer (HOSPT), instead of the CNN, to encode multi-scale features from the single-frame image. For the decoder, a top-down feature aggregation module (TFAM) is adopted to fuse features from adjacent scales. Furthermore, after analyzing existing loss functions, a simple yet effective combination is exploited to optimize the network convergence. Compared to other state-of-the-art methods, the normalized intersection-over-union (nIoU) on our IRST640 dataset and public SIRST dataset reaches 0.856 and 0.758. The detailed ablation experiments are conducted to validate the effectiveness and reasonability of each component in the method.</description><identifier>ISSN: 2072-4292</identifier><identifier>EISSN: 2072-4292</identifier><identifier>DOI: 10.3390/rs14143258</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Ablation ; Adaptability ; Artificial neural networks ; Datasets ; deep learning ; Design ; False alarms ; Field of view ; Infrared imagery ; infrared small target detection ; Infrared tracking ; Methods ; Neural networks ; Optimization ; Remote sensing ; self-attention ; Sensors ; Target detection ; transformer</subject><ispartof>Remote sensing (Basel, Switzerland), 2022-07, Vol.14 (14), p.3258</ispartof><rights>2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-3f777160aa809a46b3c064b73fd65f0e5b4aca1edecf2c885604d4e2fd2c50223</citedby><cites>FETCH-LOGICAL-c291t-3f777160aa809a46b3c064b73fd65f0e5b4aca1edecf2c885604d4e2fd2c50223</cites><orcidid>0000-0002-5646-9970</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2694059958/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2694059958?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Chen, Gao</creatorcontrib><creatorcontrib>Wang, Weihua</creatorcontrib><creatorcontrib>Tan, Sirui</creatorcontrib><title>IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection</title><title>Remote sensing (Basel, Switzerland)</title><description>Infrared small target detection occupies an important position in the infrared search and track system. The most common size of infrared images has developed to 640×512. The field-of-view (FOV) also increases significantly. As the result, there is more interference that hinders the detection of small targets in the image. However, the traditional model-driven methods do not have the capability of feature learning, resulting in poor adaptability to various scenes. Owing to the locality of convolution kernels, recent convolutional neural networks (CNN) cannot model the long-range dependency in the image to suppress false alarms. In this paper, we propose a hierarchical vision transformer-based method for infrared small target detection in larger size and FOV images of 640×512. Specifically, we design a hierarchical overlapped small patch transformer (HOSPT), instead of the CNN, to encode multi-scale features from the single-frame image. For the decoder, a top-down feature aggregation module (TFAM) is adopted to fuse features from adjacent scales. Furthermore, after analyzing existing loss functions, a simple yet effective combination is exploited to optimize the network convergence. Compared to other state-of-the-art methods, the normalized intersection-over-union (nIoU) on our IRST640 dataset and public SIRST dataset reaches 0.856 and 0.758. The detailed ablation experiments are conducted to validate the effectiveness and reasonability of each component in the method.</description><subject>Ablation</subject><subject>Adaptability</subject><subject>Artificial neural networks</subject><subject>Datasets</subject><subject>deep learning</subject><subject>Design</subject><subject>False alarms</subject><subject>Field of view</subject><subject>Infrared imagery</subject><subject>infrared small target detection</subject><subject>Infrared tracking</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Remote sensing</subject><subject>self-attention</subject><subject>Sensors</subject><subject>Target detection</subject><subject>transformer</subject><issn>2072-4292</issn><issn>2072-4292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU9Lw0AQxRdRsNRe_AQL3oTo_k2y3kq1NlAQbfS6TDa7NSXN1t304Lc3tqLO5Q3DjzcPHkKXlNxwrshtiFRQwZnMT9CIkYwlgil2-m8_R5MYN2QYzqkiYoSei5dVOfdha8MdnuJFYwME894YaPFbExvf4TJAF90BwYPgonMDY2u82kLb4hLC2vb43vbW9AN_gc4ctNFOfnSMXucP5WyRLJ8ei9l0mRimaJ9wl2UZTQlAThSItOKGpKLKuKtT6YiVlQAD1NbWOGbyXKZE1MIyVzMjCWN8jIqjb-1ho3eh2UL41B4afTj4sNYQ-sa0VjuZcU5Sx6wDAYrkFoigjjKXK1ZlZvC6Onrtgv_Y29jrjd-HboivWaoEkUrJfKCuj5QJPsZg3e9XSvR3A_qvAf4F7V53FQ</recordid><startdate>20220701</startdate><enddate>20220701</enddate><creator>Chen, Gao</creator><creator>Wang, Weihua</creator><creator>Tan, Sirui</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SN</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>BKSAR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>HCIFZ</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PCBAR</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5646-9970</orcidid></search><sort><creationdate>20220701</creationdate><title>IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection</title><author>Chen, Gao ; Wang, Weihua ; Tan, Sirui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-3f777160aa809a46b3c064b73fd65f0e5b4aca1edecf2c885604d4e2fd2c50223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Ablation</topic><topic>Adaptability</topic><topic>Artificial neural networks</topic><topic>Datasets</topic><topic>deep learning</topic><topic>Design</topic><topic>False alarms</topic><topic>Field of view</topic><topic>Infrared imagery</topic><topic>infrared small target detection</topic><topic>Infrared tracking</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Remote sensing</topic><topic>self-attention</topic><topic>Sensors</topic><topic>Target detection</topic><topic>transformer</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Gao</creatorcontrib><creatorcontrib>Wang, Weihua</creatorcontrib><creatorcontrib>Tan, Sirui</creatorcontrib><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Ecology Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Earth, Atmospheric & Aquatic Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>SciTech Premium Collection</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Earth, Atmospheric & Aquatic Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Remote sensing (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Gao</au><au>Wang, Weihua</au><au>Tan, Sirui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection</atitle><jtitle>Remote sensing (Basel, Switzerland)</jtitle><date>2022-07-01</date><risdate>2022</risdate><volume>14</volume><issue>14</issue><spage>3258</spage><pages>3258-</pages><issn>2072-4292</issn><eissn>2072-4292</eissn><abstract>Infrared small target detection occupies an important position in the infrared search and track system. The most common size of infrared images has developed to 640×512. The field-of-view (FOV) also increases significantly. As the result, there is more interference that hinders the detection of small targets in the image. However, the traditional model-driven methods do not have the capability of feature learning, resulting in poor adaptability to various scenes. Owing to the locality of convolution kernels, recent convolutional neural networks (CNN) cannot model the long-range dependency in the image to suppress false alarms. In this paper, we propose a hierarchical vision transformer-based method for infrared small target detection in larger size and FOV images of 640×512. Specifically, we design a hierarchical overlapped small patch transformer (HOSPT), instead of the CNN, to encode multi-scale features from the single-frame image. For the decoder, a top-down feature aggregation module (TFAM) is adopted to fuse features from adjacent scales. Furthermore, after analyzing existing loss functions, a simple yet effective combination is exploited to optimize the network convergence. Compared to other state-of-the-art methods, the normalized intersection-over-union (nIoU) on our IRST640 dataset and public SIRST dataset reaches 0.856 and 0.758. The detailed ablation experiments are conducted to validate the effectiveness and reasonability of each component in the method.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/rs14143258</doi><orcidid>https://orcid.org/0000-0002-5646-9970</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2072-4292 |
ispartof | Remote sensing (Basel, Switzerland), 2022-07, Vol.14 (14), p.3258 |
issn | 2072-4292 2072-4292 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_f573306f2efa4a908ea041f12f892b7c |
source | Publicly Available Content Database |
subjects | Ablation Adaptability Artificial neural networks Datasets deep learning Design False alarms Field of view Infrared imagery infrared small target detection Infrared tracking Methods Neural networks Optimization Remote sensing self-attention Sensors Target detection transformer |
title | IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T22%3A18%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=IRSTFormer:%20A%20Hierarchical%20Vision%20Transformer%20for%20Infrared%20Small%20Target%20Detection&rft.jtitle=Remote%20sensing%20(Basel,%20Switzerland)&rft.au=Chen,%20Gao&rft.date=2022-07-01&rft.volume=14&rft.issue=14&rft.spage=3258&rft.pages=3258-&rft.issn=2072-4292&rft.eissn=2072-4292&rft_id=info:doi/10.3390/rs14143258&rft_dat=%3Cproquest_doaj_%3E2694059958%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-3f777160aa809a46b3c064b73fd65f0e5b4aca1edecf2c885604d4e2fd2c50223%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2694059958&rft_id=info:pmid/&rfr_iscdi=true |