Loading…

Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification

Nowadays, an increasing number of hyperspectral images (HSIs) are becoming available. However, the utilization of unlabeled HSIs is extremely low due to high annotation costs. Thus, it is crucial to figure out how to use these unlabeled HSIs and enhance the classification performance. Fortunately, s...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal of selected topics in applied earth observations and remote sensing 2024, Vol.17, p.1348-1362
Main Authors: Kong, Weili, Liu, Baisen, Bi, Xiaojun, Pei, Jiaming, Chen, Zheng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c359t-f6c60958721aa8a4ff4dab5404930e6f4000f62b618540f78205c616903891753
container_end_page 1362
container_issue
container_start_page 1348
container_title IEEE journal of selected topics in applied earth observations and remote sensing
container_volume 17
creator Kong, Weili
Liu, Baisen
Bi, Xiaojun
Pei, Jiaming
Chen, Zheng
description Nowadays, an increasing number of hyperspectral images (HSIs) are becoming available. However, the utilization of unlabeled HSIs is extremely low due to high annotation costs. Thus, it is crucial to figure out how to use these unlabeled HSIs and enhance the classification performance. Fortunately, self-supervised training enables us to acquire latent features from unlabeled HSIs, thereby enhancing network performance via transfer learning. Whereas, most current networks for HSIs are inflexible, it is challenging for them to perform learning and accommodate multimodal HSIs. Therefore, we devise a scalable self-supervised network called instructional mask autoencoder, which can extract general patterns of HSIs by these unannotated data. It primarily consists of a spatial-spectral embedding block and a transformer-based masked autoencoder, which are utilized for projecting input samples into the same latent space and learning higher level semantic information, respectively. Moreover, we utilize a random token called ins\_{t}oken to instruct the model learn components of global information, which are highly correlated with the target pixel in HSI samples. In the fine-tuning stage, we design a learnable aggregation mechanism to put all tokens into full play. The obtained results illustrate that our method exhibits robust generalization performance and accelerates convergence across diverse datasets. In cases of limited samples, we conducted experiments on three structurally distinct HSIs, all of which achieved competitive performance. Compared to state-of-the-art methods, our approach demonstrated respective improvements of 1.97%, 0.44%, and 3.35% on these three datasets.
doi_str_mv 10.1109/JSTARS.2023.3337132
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2902135424</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10329442</ieee_id><doaj_id>oai_doaj_org_article_d7964429151f48a9abf8176f2762724b</doaj_id><sourcerecordid>2902135424</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-f6c60958721aa8a4ff4dab5404930e6f4000f62b618540f78205c616903891753</originalsourceid><addsrcrecordid>eNpNUU1LAzEUDKJgrf4CPSx43prv3XgrRW2lIth68BTepknZum1qsnvovzd1RTw9GGbmDTMIXRM8IgSru-fFcvy2GFFM2YgxVhBGT9CAEkFyIpg4RQOimMoJx_wcXcS4wVjSQrEB-pjtYhs609Z-B032AvEzG3ettzvjVzbcZ-NsYaCBqrHZ3ELY2ZA5H7LpYW9D3FvThiSbbWFts0kDMdauNnB0u0RnDppor37vEL0_Piwn03z--jSbjOe5YUK1uZNGYiXKghKAErhzfAWVSEkVw1Y6jjF2klaSlAl0RUmxMJJIhVmpSCHYEM1635WHjd6HegvhoD3U-gfwYa0htLVprF4VSnJOVerF8RIUVK4khXS0SG1QXiWv295rH_xXZ2OrN74LqZioqcKUMMEpTyzWs0zwMQbr_r4SrI976H4PfdxD_-6RVDe9qrbW_lMwqlIm9g3T1YS4</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2902135424</pqid></control><display><type>article</type><title>Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification</title><source>Alma/SFX Local Collection</source><creator>Kong, Weili ; Liu, Baisen ; Bi, Xiaojun ; Pei, Jiaming ; Chen, Zheng</creator><creatorcontrib>Kong, Weili ; Liu, Baisen ; Bi, Xiaojun ; Pei, Jiaming ; Chen, Zheng</creatorcontrib><description>Nowadays, an increasing number of hyperspectral images (HSIs) are becoming available. However, the utilization of unlabeled HSIs is extremely low due to high annotation costs. Thus, it is crucial to figure out how to use these unlabeled HSIs and enhance the classification performance. Fortunately, self-supervised training enables us to acquire latent features from unlabeled HSIs, thereby enhancing network performance via transfer learning. Whereas, most current networks for HSIs are inflexible, it is challenging for them to perform learning and accommodate multimodal HSIs. Therefore, we devise a scalable self-supervised network called instructional mask autoencoder, which can extract general patterns of HSIs by these unannotated data. It primarily consists of a spatial-spectral embedding block and a transformer-based masked autoencoder, which are utilized for projecting input samples into the same latent space and learning higher level semantic information, respectively. Moreover, we utilize a random token called &lt;inline-formula&gt;&lt;tex-math notation="LaTeX"&gt;ins\_{t}oken&lt;/tex-math&gt;&lt;/inline-formula&gt; to instruct the model learn components of global information, which are highly correlated with the target pixel in HSI samples. In the fine-tuning stage, we design a learnable aggregation mechanism to put all tokens into full play. The obtained results illustrate that our method exhibits robust generalization performance and accelerates convergence across diverse datasets. In cases of limited samples, we conducted experiments on three structurally distinct HSIs, all of which achieved competitive performance. Compared to state-of-the-art methods, our approach demonstrated respective improvements of 1.97%, 0.44%, and 3.35% on these three datasets.</description><identifier>ISSN: 1939-1404</identifier><identifier>EISSN: 2151-1535</identifier><identifier>DOI: 10.1109/JSTARS.2023.3337132</identifier><identifier>CODEN: IJSTHZ</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Aggregation ; Annotations ; Classification ; Convolutional neural networks ; Data models ; Datasets ; Embedding ; Hyperspectral imaging ; Image classification ; Information processing ; Learning ; Mask autoencoder ; multimodal hyperspectral image (HSI) ; self-supervised ; Spatial resolution ; Task analysis ; Training ; Transfer learning ; Transformers ; unlabeled HSI</subject><ispartof>IEEE journal of selected topics in applied earth observations and remote sensing, 2024, Vol.17, p.1348-1362</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-f6c60958721aa8a4ff4dab5404930e6f4000f62b618540f78205c616903891753</cites><orcidid>0000-0002-5382-1000 ; 0000-0003-4783-4566 ; 0000-0002-2551-816X ; 0000-0002-9654-0997 ; 0000-0003-2774-0511</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,4024,27923,27924,27925</link.rule.ids></links><search><creatorcontrib>Kong, Weili</creatorcontrib><creatorcontrib>Liu, Baisen</creatorcontrib><creatorcontrib>Bi, Xiaojun</creatorcontrib><creatorcontrib>Pei, Jiaming</creatorcontrib><creatorcontrib>Chen, Zheng</creatorcontrib><title>Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification</title><title>IEEE journal of selected topics in applied earth observations and remote sensing</title><addtitle>JSTARS</addtitle><description>Nowadays, an increasing number of hyperspectral images (HSIs) are becoming available. However, the utilization of unlabeled HSIs is extremely low due to high annotation costs. Thus, it is crucial to figure out how to use these unlabeled HSIs and enhance the classification performance. Fortunately, self-supervised training enables us to acquire latent features from unlabeled HSIs, thereby enhancing network performance via transfer learning. Whereas, most current networks for HSIs are inflexible, it is challenging for them to perform learning and accommodate multimodal HSIs. Therefore, we devise a scalable self-supervised network called instructional mask autoencoder, which can extract general patterns of HSIs by these unannotated data. It primarily consists of a spatial-spectral embedding block and a transformer-based masked autoencoder, which are utilized for projecting input samples into the same latent space and learning higher level semantic information, respectively. Moreover, we utilize a random token called &lt;inline-formula&gt;&lt;tex-math notation="LaTeX"&gt;ins\_{t}oken&lt;/tex-math&gt;&lt;/inline-formula&gt; to instruct the model learn components of global information, which are highly correlated with the target pixel in HSI samples. In the fine-tuning stage, we design a learnable aggregation mechanism to put all tokens into full play. The obtained results illustrate that our method exhibits robust generalization performance and accelerates convergence across diverse datasets. In cases of limited samples, we conducted experiments on three structurally distinct HSIs, all of which achieved competitive performance. Compared to state-of-the-art methods, our approach demonstrated respective improvements of 1.97%, 0.44%, and 3.35% on these three datasets.</description><subject>Aggregation</subject><subject>Annotations</subject><subject>Classification</subject><subject>Convolutional neural networks</subject><subject>Data models</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Hyperspectral imaging</subject><subject>Image classification</subject><subject>Information processing</subject><subject>Learning</subject><subject>Mask autoencoder</subject><subject>multimodal hyperspectral image (HSI)</subject><subject>self-supervised</subject><subject>Spatial resolution</subject><subject>Task analysis</subject><subject>Training</subject><subject>Transfer learning</subject><subject>Transformers</subject><subject>unlabeled HSI</subject><issn>1939-1404</issn><issn>2151-1535</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LAzEUDKJgrf4CPSx43prv3XgrRW2lIth68BTepknZum1qsnvovzd1RTw9GGbmDTMIXRM8IgSru-fFcvy2GFFM2YgxVhBGT9CAEkFyIpg4RQOimMoJx_wcXcS4wVjSQrEB-pjtYhs609Z-B032AvEzG3ettzvjVzbcZ-NsYaCBqrHZ3ELY2ZA5H7LpYW9D3FvThiSbbWFts0kDMdauNnB0u0RnDppor37vEL0_Piwn03z--jSbjOe5YUK1uZNGYiXKghKAErhzfAWVSEkVw1Y6jjF2klaSlAl0RUmxMJJIhVmpSCHYEM1635WHjd6HegvhoD3U-gfwYa0htLVprF4VSnJOVerF8RIUVK4khXS0SG1QXiWv295rH_xXZ2OrN74LqZioqcKUMMEpTyzWs0zwMQbr_r4SrI976H4PfdxD_-6RVDe9qrbW_lMwqlIm9g3T1YS4</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Kong, Weili</creator><creator>Liu, Baisen</creator><creator>Bi, Xiaojun</creator><creator>Pei, Jiaming</creator><creator>Chen, Zheng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5382-1000</orcidid><orcidid>https://orcid.org/0000-0003-4783-4566</orcidid><orcidid>https://orcid.org/0000-0002-2551-816X</orcidid><orcidid>https://orcid.org/0000-0002-9654-0997</orcidid><orcidid>https://orcid.org/0000-0003-2774-0511</orcidid></search><sort><creationdate>2024</creationdate><title>Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification</title><author>Kong, Weili ; Liu, Baisen ; Bi, Xiaojun ; Pei, Jiaming ; Chen, Zheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-f6c60958721aa8a4ff4dab5404930e6f4000f62b618540f78205c616903891753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Aggregation</topic><topic>Annotations</topic><topic>Classification</topic><topic>Convolutional neural networks</topic><topic>Data models</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Hyperspectral imaging</topic><topic>Image classification</topic><topic>Information processing</topic><topic>Learning</topic><topic>Mask autoencoder</topic><topic>multimodal hyperspectral image (HSI)</topic><topic>self-supervised</topic><topic>Spatial resolution</topic><topic>Task analysis</topic><topic>Training</topic><topic>Transfer learning</topic><topic>Transformers</topic><topic>unlabeled HSI</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Weili</creatorcontrib><creatorcontrib>Liu, Baisen</creatorcontrib><creatorcontrib>Bi, Xiaojun</creatorcontrib><creatorcontrib>Pei, Jiaming</creatorcontrib><creatorcontrib>Chen, Zheng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library</collection><collection>CrossRef</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy &amp; Non-Living Resources</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE journal of selected topics in applied earth observations and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kong, Weili</au><au>Liu, Baisen</au><au>Bi, Xiaojun</au><au>Pei, Jiaming</au><au>Chen, Zheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification</atitle><jtitle>IEEE journal of selected topics in applied earth observations and remote sensing</jtitle><stitle>JSTARS</stitle><date>2024</date><risdate>2024</risdate><volume>17</volume><spage>1348</spage><epage>1362</epage><pages>1348-1362</pages><issn>1939-1404</issn><eissn>2151-1535</eissn><coden>IJSTHZ</coden><abstract>Nowadays, an increasing number of hyperspectral images (HSIs) are becoming available. However, the utilization of unlabeled HSIs is extremely low due to high annotation costs. Thus, it is crucial to figure out how to use these unlabeled HSIs and enhance the classification performance. Fortunately, self-supervised training enables us to acquire latent features from unlabeled HSIs, thereby enhancing network performance via transfer learning. Whereas, most current networks for HSIs are inflexible, it is challenging for them to perform learning and accommodate multimodal HSIs. Therefore, we devise a scalable self-supervised network called instructional mask autoencoder, which can extract general patterns of HSIs by these unannotated data. It primarily consists of a spatial-spectral embedding block and a transformer-based masked autoencoder, which are utilized for projecting input samples into the same latent space and learning higher level semantic information, respectively. Moreover, we utilize a random token called &lt;inline-formula&gt;&lt;tex-math notation="LaTeX"&gt;ins\_{t}oken&lt;/tex-math&gt;&lt;/inline-formula&gt; to instruct the model learn components of global information, which are highly correlated with the target pixel in HSI samples. In the fine-tuning stage, we design a learnable aggregation mechanism to put all tokens into full play. The obtained results illustrate that our method exhibits robust generalization performance and accelerates convergence across diverse datasets. In cases of limited samples, we conducted experiments on three structurally distinct HSIs, all of which achieved competitive performance. Compared to state-of-the-art methods, our approach demonstrated respective improvements of 1.97%, 0.44%, and 3.35% on these three datasets.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/JSTARS.2023.3337132</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-5382-1000</orcidid><orcidid>https://orcid.org/0000-0003-4783-4566</orcidid><orcidid>https://orcid.org/0000-0002-2551-816X</orcidid><orcidid>https://orcid.org/0000-0002-9654-0997</orcidid><orcidid>https://orcid.org/0000-0003-2774-0511</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1939-1404
ispartof IEEE journal of selected topics in applied earth observations and remote sensing, 2024, Vol.17, p.1348-1362
issn 1939-1404
2151-1535
language eng
recordid cdi_proquest_journals_2902135424
source Alma/SFX Local Collection
subjects Aggregation
Annotations
Classification
Convolutional neural networks
Data models
Datasets
Embedding
Hyperspectral imaging
Image classification
Information processing
Learning
Mask autoencoder
multimodal hyperspectral image (HSI)
self-supervised
Spatial resolution
Task analysis
Training
Transfer learning
Transformers
unlabeled HSI
title Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T11%3A50%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Instructional%20Mask%20Autoencoder:%20A%20Scalable%20Learner%20for%20Hyperspectral%20Image%20Classification&rft.jtitle=IEEE%20journal%20of%20selected%20topics%20in%20applied%20earth%20observations%20and%20remote%20sensing&rft.au=Kong,%20Weili&rft.date=2024&rft.volume=17&rft.spage=1348&rft.epage=1362&rft.pages=1348-1362&rft.issn=1939-1404&rft.eissn=2151-1535&rft.coden=IJSTHZ&rft_id=info:doi/10.1109/JSTARS.2023.3337132&rft_dat=%3Cproquest_doaj_%3E2902135424%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c359t-f6c60958721aa8a4ff4dab5404930e6f4000f62b618540f78205c616903891753%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2902135424&rft_id=info:pmid/&rft_ieee_id=10329442&rfr_iscdi=true