Loading…

Spatial-Spectral Transformer With Conditional Position Encoding for Hyperspectral Image Classification

In Transformer-based hyperspectral image classification (HSIC), predefined positional encodings (PEs) are crucial for capturing the order of each input token. However, their typical representation as fixed-dimensional learnable vectors makes it challenging to adapt to variable-length input sequences...

Full description

Saved in:
Bibliographic Details
Published in:IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5
Main Authors: Ahmad, Muhammad, Usama, Muhammad, Khan, Adil Mehmood, Distefano, Salvatore, Altuwaijri, Hamad Ahmed, Mazzara, Manuel
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c246t-6eade20306de7efbac66e6bda4b7163470e1309b7747d9fa0861a9e8c0e444043
container_end_page 5
container_issue
container_start_page 1
container_title IEEE geoscience and remote sensing letters
container_volume 21
creator Ahmad, Muhammad
Usama, Muhammad
Khan, Adil Mehmood
Distefano, Salvatore
Altuwaijri, Hamad Ahmed
Mazzara, Manuel
description In Transformer-based hyperspectral image classification (HSIC), predefined positional encodings (PEs) are crucial for capturing the order of each input token. However, their typical representation as fixed-dimensional learnable vectors makes it challenging to adapt to variable-length input sequences, thereby limiting the broader application of Transformers for HSIC. To address this issue, this study introduces an implicit conditional PEs (CPEs) scheme in a Transformer for HSIC, conditioned on the input token's local neighborhood. The proposed spatial-spectral Transformer (SSFormer) integrates spatial-spectral information and enhances classification performance by incorporating a CPE mechanism, thereby increasing the Transformer layers' capacity to preserve contextual relationships within the HSI data. Moreover, SSFormer ensembles the cross attention between patches and proposed learnable embeddings. This enables the model to capture global and local features simultaneously while addressing the constraint of limited training samples in a computationally efficient manner. Extensive experiments on publicly available HSI benchmarking datasets were conducted to validate the effectiveness of the proposed SSFormer model. The results demonstrated remarkable performance, achieving the classification accuracies of 97.7% on the Indian Pines dataset and 96.08% on the University of Houston dataset.
doi_str_mv 10.1109/LGRS.2024.3431188
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_LGRS_2024_3431188</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10604879</ieee_id><sourcerecordid>3089923759</sourcerecordid><originalsourceid>FETCH-LOGICAL-c246t-6eade20306de7efbac66e6bda4b7163470e1309b7747d9fa0861a9e8c0e444043</originalsourceid><addsrcrecordid>eNpNkF1LwzAUhoMoOKc_QPCi4HVn0qT5uJQyt8FAcRO9C2l7OjO6pibdxf79WjfBq_PCed7D4UHonuAJIVg9LWfvq0mCEzahjBIi5QUakTSVMU4FuRwyS-NUya9rdBPCFveklGKEqlVrOmvqeNVC0XlTR2tvmlA5vwMffdruO8pcU9rOuqZfvrnwG6NpU7jSNpuoJ6P5oQUf_g4sdmYDUVabEGxlCzPwt-iqMnWAu_Mco4-X6Tqbx8vX2SJ7XsZFwngXczAlJJhiXoKAKjcF58Dz0rBcEE6ZwEAoVrkQTJSqMlhyYhTIAgNjDDM6Ro-nu613P3sInd66ve8_D5piqVRCRap6ipyowrsQPFS69XZn_EETrAedetCpB536rLPvPJw6FgD-8RwzKRQ9Ag_ncwY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3089923759</pqid></control><display><type>article</type><title>Spatial-Spectral Transformer With Conditional Position Encoding for Hyperspectral Image Classification</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Ahmad, Muhammad ; Usama, Muhammad ; Khan, Adil Mehmood ; Distefano, Salvatore ; Altuwaijri, Hamad Ahmed ; Mazzara, Manuel</creator><creatorcontrib>Ahmad, Muhammad ; Usama, Muhammad ; Khan, Adil Mehmood ; Distefano, Salvatore ; Altuwaijri, Hamad Ahmed ; Mazzara, Manuel</creatorcontrib><description>In Transformer-based hyperspectral image classification (HSIC), predefined positional encodings (PEs) are crucial for capturing the order of each input token. However, their typical representation as fixed-dimensional learnable vectors makes it challenging to adapt to variable-length input sequences, thereby limiting the broader application of Transformers for HSIC. To address this issue, this study introduces an implicit conditional PEs (CPEs) scheme in a Transformer for HSIC, conditioned on the input token's local neighborhood. The proposed spatial-spectral Transformer (SSFormer) integrates spatial-spectral information and enhances classification performance by incorporating a CPE mechanism, thereby increasing the Transformer layers' capacity to preserve contextual relationships within the HSI data. Moreover, SSFormer ensembles the cross attention between patches and proposed learnable embeddings. This enables the model to capture global and local features simultaneously while addressing the constraint of limited training samples in a computationally efficient manner. Extensive experiments on publicly available HSI benchmarking datasets were conducted to validate the effectiveness of the proposed SSFormer model. The results demonstrated remarkable performance, achieving the classification accuracies of 97.7% on the Indian Pines dataset and 96.08% on the University of Houston dataset.</description><identifier>ISSN: 1545-598X</identifier><identifier>EISSN: 1558-0571</identifier><identifier>DOI: 10.1109/LGRS.2024.3431188</identifier><identifier>CODEN: IGRSBY</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Classification ; Computational modeling ; Datasets ; Feature extraction ; Geoscience and remote sensing ; Hyperspectral image classification (HSIC) ; Hyperspectral imaging ; Image classification ; Sequences ; spatial–spectral Transformer (SSFormer) ; Three-dimensional displays ; Training ; Transformers ; Vectors</subject><ispartof>IEEE geoscience and remote sensing letters, 2024, Vol.21, p.1-5</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c246t-6eade20306de7efbac66e6bda4b7163470e1309b7747d9fa0861a9e8c0e444043</cites><orcidid>0000-0002-3860-4948 ; 0000-0002-3320-2261 ; 0000-0001-5015-8605 ; 0000-0003-2220-8518 ; 0000-0002-2604-5974</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10604879$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4010,27900,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Ahmad, Muhammad</creatorcontrib><creatorcontrib>Usama, Muhammad</creatorcontrib><creatorcontrib>Khan, Adil Mehmood</creatorcontrib><creatorcontrib>Distefano, Salvatore</creatorcontrib><creatorcontrib>Altuwaijri, Hamad Ahmed</creatorcontrib><creatorcontrib>Mazzara, Manuel</creatorcontrib><title>Spatial-Spectral Transformer With Conditional Position Encoding for Hyperspectral Image Classification</title><title>IEEE geoscience and remote sensing letters</title><addtitle>LGRS</addtitle><description>In Transformer-based hyperspectral image classification (HSIC), predefined positional encodings (PEs) are crucial for capturing the order of each input token. However, their typical representation as fixed-dimensional learnable vectors makes it challenging to adapt to variable-length input sequences, thereby limiting the broader application of Transformers for HSIC. To address this issue, this study introduces an implicit conditional PEs (CPEs) scheme in a Transformer for HSIC, conditioned on the input token's local neighborhood. The proposed spatial-spectral Transformer (SSFormer) integrates spatial-spectral information and enhances classification performance by incorporating a CPE mechanism, thereby increasing the Transformer layers' capacity to preserve contextual relationships within the HSI data. Moreover, SSFormer ensembles the cross attention between patches and proposed learnable embeddings. This enables the model to capture global and local features simultaneously while addressing the constraint of limited training samples in a computationally efficient manner. Extensive experiments on publicly available HSI benchmarking datasets were conducted to validate the effectiveness of the proposed SSFormer model. The results demonstrated remarkable performance, achieving the classification accuracies of 97.7% on the Indian Pines dataset and 96.08% on the University of Houston dataset.</description><subject>Accuracy</subject><subject>Classification</subject><subject>Computational modeling</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Geoscience and remote sensing</subject><subject>Hyperspectral image classification (HSIC)</subject><subject>Hyperspectral imaging</subject><subject>Image classification</subject><subject>Sequences</subject><subject>spatial–spectral Transformer (SSFormer)</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>Transformers</subject><subject>Vectors</subject><issn>1545-598X</issn><issn>1558-0571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkF1LwzAUhoMoOKc_QPCi4HVn0qT5uJQyt8FAcRO9C2l7OjO6pibdxf79WjfBq_PCed7D4UHonuAJIVg9LWfvq0mCEzahjBIi5QUakTSVMU4FuRwyS-NUya9rdBPCFveklGKEqlVrOmvqeNVC0XlTR2tvmlA5vwMffdruO8pcU9rOuqZfvrnwG6NpU7jSNpuoJ6P5oQUf_g4sdmYDUVabEGxlCzPwt-iqMnWAu_Mco4-X6Tqbx8vX2SJ7XsZFwngXczAlJJhiXoKAKjcF58Dz0rBcEE6ZwEAoVrkQTJSqMlhyYhTIAgNjDDM6Ro-nu613P3sInd66ve8_D5piqVRCRap6ipyowrsQPFS69XZn_EETrAedetCpB536rLPvPJw6FgD-8RwzKRQ9Ag_ncwY</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Ahmad, Muhammad</creator><creator>Usama, Muhammad</creator><creator>Khan, Adil Mehmood</creator><creator>Distefano, Salvatore</creator><creator>Altuwaijri, Hamad Ahmed</creator><creator>Mazzara, Manuel</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TG</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3860-4948</orcidid><orcidid>https://orcid.org/0000-0002-3320-2261</orcidid><orcidid>https://orcid.org/0000-0001-5015-8605</orcidid><orcidid>https://orcid.org/0000-0003-2220-8518</orcidid><orcidid>https://orcid.org/0000-0002-2604-5974</orcidid></search><sort><creationdate>2024</creationdate><title>Spatial-Spectral Transformer With Conditional Position Encoding for Hyperspectral Image Classification</title><author>Ahmad, Muhammad ; Usama, Muhammad ; Khan, Adil Mehmood ; Distefano, Salvatore ; Altuwaijri, Hamad Ahmed ; Mazzara, Manuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c246t-6eade20306de7efbac66e6bda4b7163470e1309b7747d9fa0861a9e8c0e444043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Classification</topic><topic>Computational modeling</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Geoscience and remote sensing</topic><topic>Hyperspectral image classification (HSIC)</topic><topic>Hyperspectral imaging</topic><topic>Image classification</topic><topic>Sequences</topic><topic>spatial–spectral Transformer (SSFormer)</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>Transformers</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ahmad, Muhammad</creatorcontrib><creatorcontrib>Usama, Muhammad</creatorcontrib><creatorcontrib>Khan, Adil Mehmood</creatorcontrib><creatorcontrib>Distefano, Salvatore</creatorcontrib><creatorcontrib>Altuwaijri, Hamad Ahmed</creatorcontrib><creatorcontrib>Mazzara, Manuel</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy &amp; Non-Living Resources</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE geoscience and remote sensing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ahmad, Muhammad</au><au>Usama, Muhammad</au><au>Khan, Adil Mehmood</au><au>Distefano, Salvatore</au><au>Altuwaijri, Hamad Ahmed</au><au>Mazzara, Manuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spatial-Spectral Transformer With Conditional Position Encoding for Hyperspectral Image Classification</atitle><jtitle>IEEE geoscience and remote sensing letters</jtitle><stitle>LGRS</stitle><date>2024</date><risdate>2024</risdate><volume>21</volume><spage>1</spage><epage>5</epage><pages>1-5</pages><issn>1545-598X</issn><eissn>1558-0571</eissn><coden>IGRSBY</coden><abstract>In Transformer-based hyperspectral image classification (HSIC), predefined positional encodings (PEs) are crucial for capturing the order of each input token. However, their typical representation as fixed-dimensional learnable vectors makes it challenging to adapt to variable-length input sequences, thereby limiting the broader application of Transformers for HSIC. To address this issue, this study introduces an implicit conditional PEs (CPEs) scheme in a Transformer for HSIC, conditioned on the input token's local neighborhood. The proposed spatial-spectral Transformer (SSFormer) integrates spatial-spectral information and enhances classification performance by incorporating a CPE mechanism, thereby increasing the Transformer layers' capacity to preserve contextual relationships within the HSI data. Moreover, SSFormer ensembles the cross attention between patches and proposed learnable embeddings. This enables the model to capture global and local features simultaneously while addressing the constraint of limited training samples in a computationally efficient manner. Extensive experiments on publicly available HSI benchmarking datasets were conducted to validate the effectiveness of the proposed SSFormer model. The results demonstrated remarkable performance, achieving the classification accuracies of 97.7% on the Indian Pines dataset and 96.08% on the University of Houston dataset.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LGRS.2024.3431188</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0002-3860-4948</orcidid><orcidid>https://orcid.org/0000-0002-3320-2261</orcidid><orcidid>https://orcid.org/0000-0001-5015-8605</orcidid><orcidid>https://orcid.org/0000-0003-2220-8518</orcidid><orcidid>https://orcid.org/0000-0002-2604-5974</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1545-598X
ispartof IEEE geoscience and remote sensing letters, 2024, Vol.21, p.1-5
issn 1545-598X
1558-0571
language eng
recordid cdi_crossref_primary_10_1109_LGRS_2024_3431188
source IEEE Electronic Library (IEL) Journals
subjects Accuracy
Classification
Computational modeling
Datasets
Feature extraction
Geoscience and remote sensing
Hyperspectral image classification (HSIC)
Hyperspectral imaging
Image classification
Sequences
spatial–spectral Transformer (SSFormer)
Three-dimensional displays
Training
Transformers
Vectors
title Spatial-Spectral Transformer With Conditional Position Encoding for Hyperspectral Image Classification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-23T08%3A55%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spatial-Spectral%20Transformer%20With%20Conditional%20Position%20Encoding%20for%20Hyperspectral%20Image%20Classification&rft.jtitle=IEEE%20geoscience%20and%20remote%20sensing%20letters&rft.au=Ahmad,%20Muhammad&rft.date=2024&rft.volume=21&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.issn=1545-598X&rft.eissn=1558-0571&rft.coden=IGRSBY&rft_id=info:doi/10.1109/LGRS.2024.3431188&rft_dat=%3Cproquest_cross%3E3089923759%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c246t-6eade20306de7efbac66e6bda4b7163470e1309b7747d9fa0861a9e8c0e444043%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3089923759&rft_id=info:pmid/&rft_ieee_id=10604879&rfr_iscdi=true