Loading…

Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning

Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature....

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-02
Main Authors:	Kim, Jiseob, Shim, Kyuhong, Kim, Junhan, Shim, Byonghyo
Format:	Article
Language:	English
Subjects:	Deep learning Feature extraction Image resolution Machine learning Modules Zero-shot learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Kim, Jiseob Shim, Kyuhong Kim, Junhan Shim, Byonghyo
description	Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. In ViT, the entire image region is processed without the degradation of the image resolution and the local image information is preserved in patch features. To fully enjoy these benefits of ViT, we exploit patch features as well as the CLS feature in extracting the attribute-related image feature. In particular, we propose a novel attention-based module, called attribute attention module (AAM), to aggregate the attribute-related information in patch features. In AAM, the correlation between each patch feature and the synthetic image attribute is used as the importance weight for each patch. From extensive experiments on benchmark datasets, we demonstrate that the proposed technique outperforms the state-of-the-art GZSL approaches by a large margin.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2772191504</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2772191504</sourcerecordid><originalsourceid>FETCH-proquest_journals_27721915043</originalsourceid><addsrcrecordid>eNqNyrEKwjAUQNEgCBbtPwScA2nSWp2l1UFcFAeX8tRXTamJvqQgfr0V_ACnO9wzYJHSOhHzVKkRi71vpJRqlqss0xHbHow3zvI9gfW1ozuSOIHHCy8RQkfIi1cgOIcv6j9foUWC1rx7ckRyYndzgW8QyBp7nbBhDa3H-Ncxm5bFfrkWD3LPDn2oGteR7Vel8lwliySTqf5PfQD-vz5v</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2772191504</pqid></control><display><type>article</type><title>Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning</title><source>Publicly Available Content (ProQuest)</source><creator>Kim, Jiseob ; Shim, Kyuhong ; Kim, Junhan ; Shim, Byonghyo</creator><creatorcontrib>Kim, Jiseob ; Shim, Kyuhong ; Kim, Junhan ; Shim, Byonghyo</creatorcontrib><description>Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. In ViT, the entire image region is processed without the degradation of the image resolution and the local image information is preserved in patch features. To fully enjoy these benefits of ViT, we exploit patch features as well as the CLS feature in extracting the attribute-related image feature. In particular, we propose a novel attention-based module, called attribute attention module (AAM), to aggregate the attribute-related information in patch features. In AAM, the correlation between each patch feature and the synthetic image attribute is used as the importance weight for each patch. From extensive experiments on benchmark datasets, we demonstrate that the proposed technique outperforms the state-of-the-art GZSL approaches by a large margin.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Deep learning ; Feature extraction ; Image resolution ; Machine learning ; Modules ; Zero-shot learning</subject><ispartof>arXiv.org, 2023-02</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2772191504?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Kim, Jiseob</creatorcontrib><creatorcontrib>Shim, Kyuhong</creatorcontrib><creatorcontrib>Kim, Junhan</creatorcontrib><creatorcontrib>Shim, Byonghyo</creatorcontrib><title>Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning</title><title>arXiv.org</title><description>Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. In ViT, the entire image region is processed without the degradation of the image resolution and the local image information is preserved in patch features. To fully enjoy these benefits of ViT, we exploit patch features as well as the CLS feature in extracting the attribute-related image feature. In particular, we propose a novel attention-based module, called attribute attention module (AAM), to aggregate the attribute-related information in patch features. In AAM, the correlation between each patch feature and the synthetic image attribute is used as the importance weight for each patch. From extensive experiments on benchmark datasets, we demonstrate that the proposed technique outperforms the state-of-the-art GZSL approaches by a large margin.</description><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Image resolution</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Zero-shot learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyrEKwjAUQNEgCBbtPwScA2nSWp2l1UFcFAeX8tRXTamJvqQgfr0V_ACnO9wzYJHSOhHzVKkRi71vpJRqlqss0xHbHow3zvI9gfW1ozuSOIHHCy8RQkfIi1cgOIcv6j9foUWC1rx7ckRyYndzgW8QyBp7nbBhDa3H-Ncxm5bFfrkWD3LPDn2oGteR7Vel8lwliySTqf5PfQD-vz5v</recordid><startdate>20230202</startdate><enddate>20230202</enddate><creator>Kim, Jiseob</creator><creator>Shim, Kyuhong</creator><creator>Kim, Junhan</creator><creator>Shim, Byonghyo</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230202</creationdate><title>Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning</title><author>Kim, Jiseob ; Shim, Kyuhong ; Kim, Junhan ; Shim, Byonghyo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27721915043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Image resolution</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Zero-shot learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Kim, Jiseob</creatorcontrib><creatorcontrib>Shim, Kyuhong</creatorcontrib><creatorcontrib>Kim, Junhan</creatorcontrib><creatorcontrib>Shim, Byonghyo</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Jiseob</au><au>Shim, Kyuhong</au><au>Kim, Junhan</au><au>Shim, Byonghyo</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning</atitle><jtitle>arXiv.org</jtitle><date>2023-02-02</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. In ViT, the entire image region is processed without the degradation of the image resolution and the local image information is preserved in patch features. To fully enjoy these benefits of ViT, we exploit patch features as well as the CLS feature in extracting the attribute-related image feature. In particular, we propose a novel attention-based module, called attribute attention module (AAM), to aggregate the attribute-related information in patch features. In AAM, the correlation between each patch feature and the synthetic image attribute is used as the importance weight for each patch. From extensive experiments on benchmark datasets, we demonstrate that the proposed technique outperforms the state-of-the-art GZSL approaches by a large margin.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-02
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2772191504
source	Publicly Available Content (ProQuest)
subjects	Deep learning Feature extraction Image resolution Machine learning Modules Zero-shot learning
title	Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T07%3A58%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Vision%20Transformer-based%20Feature%20Extraction%20for%20Generalized%20Zero-Shot%20Learning&rft.jtitle=arXiv.org&rft.au=Kim,%20Jiseob&rft.date=2023-02-02&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2772191504%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27721915043%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2772191504&rft_id=info:pmid/&rfr_iscdi=true