Loading…

Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples

Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)...

Full description

Saved in:
Bibliographic Details
Published in:Journal of electrical engineering & technology 2024-09, Vol.19 (7), p.4587-4593
Main Authors: Kang, Sungjae, Seo, Kisung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133
container_end_page 4593
container_issue 7
container_start_page 4587
container_title Journal of electrical engineering & technology
container_volume 19
creator Kang, Sungjae
Seo, Kisung
description Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.
doi_str_mv 10.1007/s42835-024-01889-6
format article
fullrecord <record><control><sourceid>crossref_sprin</sourceid><recordid>TN_cdi_crossref_primary_10_1007_s42835_024_01889_6</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1007_s42835_024_01889_6</sourcerecordid><originalsourceid>FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsv4CovEM1tMpOlVqtCQbCt25CmZ6ap6UxJRotvb3S6dnXg_Bd-PoSuGb1hlJa3SfJKFIRySSirKk3UCRpxqgUpJRenaMR0mWVG-Tm6SGlLqWK0ECPUzD-gdxs8CTYlX3tne9-12LZrfFTubYI1ftnZBvAb9NHDlw14mXzb4He_wAffb_AcQk0efOp9CEND3UU8hQOe290-QLpEZ7UNCa6Od4yW08fF5JnMXp9eJncz4rjkPZF5VlkqCSsNtFBKV3k-4_nDpQZQrlqLkharLJQOQBcUrKo0c5apQjAhxogPvS52KUWozT76nY3fhlHzi8oMqExGZf5QGZVDYgilbG4biGbbfcY27_wv9QM5tmtN</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</title><source>Springer Link</source><creator>Kang, Sungjae ; Seo, Kisung</creator><creatorcontrib>Kang, Sungjae ; Seo, Kisung</creatorcontrib><description>Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.</description><identifier>ISSN: 1975-0102</identifier><identifier>EISSN: 2093-7423</identifier><identifier>DOI: 10.1007/s42835-024-01889-6</identifier><language>eng</language><publisher>Singapore: Springer Nature Singapore</publisher><subject>Electrical Engineering ; Electrical Machines and Networks ; Electronics and Microelectronics ; Engineering ; Instrumentation ; Original Article ; Power Electronics</subject><ispartof>Journal of electrical engineering &amp; technology, 2024-09, Vol.19 (7), p.4587-4593</ispartof><rights>The Author(s) under exclusive licence to The Korean Institute of Electrical Engineers 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133</cites><orcidid>0000-0002-5256-0582</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Kang, Sungjae</creatorcontrib><creatorcontrib>Seo, Kisung</creatorcontrib><title>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</title><title>Journal of electrical engineering &amp; technology</title><addtitle>J. Electr. Eng. Technol</addtitle><description>Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.</description><subject>Electrical Engineering</subject><subject>Electrical Machines and Networks</subject><subject>Electronics and Microelectronics</subject><subject>Engineering</subject><subject>Instrumentation</subject><subject>Original Article</subject><subject>Power Electronics</subject><issn>1975-0102</issn><issn>2093-7423</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMtKAzEUhoMoWKsv4CovEM1tMpOlVqtCQbCt25CmZ6ap6UxJRotvb3S6dnXg_Bd-PoSuGb1hlJa3SfJKFIRySSirKk3UCRpxqgUpJRenaMR0mWVG-Tm6SGlLqWK0ECPUzD-gdxs8CTYlX3tne9-12LZrfFTubYI1ftnZBvAb9NHDlw14mXzb4He_wAffb_AcQk0efOp9CEND3UU8hQOe290-QLpEZ7UNCa6Od4yW08fF5JnMXp9eJncz4rjkPZF5VlkqCSsNtFBKV3k-4_nDpQZQrlqLkharLJQOQBcUrKo0c5apQjAhxogPvS52KUWozT76nY3fhlHzi8oMqExGZf5QGZVDYgilbG4biGbbfcY27_wv9QM5tmtN</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Kang, Sungjae</creator><creator>Seo, Kisung</creator><general>Springer Nature Singapore</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-5256-0582</orcidid></search><sort><creationdate>20240901</creationdate><title>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</title><author>Kang, Sungjae ; Seo, Kisung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Electrical Engineering</topic><topic>Electrical Machines and Networks</topic><topic>Electronics and Microelectronics</topic><topic>Engineering</topic><topic>Instrumentation</topic><topic>Original Article</topic><topic>Power Electronics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kang, Sungjae</creatorcontrib><creatorcontrib>Seo, Kisung</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of electrical engineering &amp; technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kang, Sungjae</au><au>Seo, Kisung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</atitle><jtitle>Journal of electrical engineering &amp; technology</jtitle><stitle>J. Electr. Eng. Technol</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>19</volume><issue>7</issue><spage>4587</spage><epage>4593</epage><pages>4587-4593</pages><issn>1975-0102</issn><eissn>2093-7423</eissn><abstract>Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.</abstract><cop>Singapore</cop><pub>Springer Nature Singapore</pub><doi>10.1007/s42835-024-01889-6</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0002-5256-0582</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1975-0102
ispartof Journal of electrical engineering & technology, 2024-09, Vol.19 (7), p.4587-4593
issn 1975-0102
2093-7423
language eng
recordid cdi_crossref_primary_10_1007_s42835_024_01889_6
source Springer Link
subjects Electrical Engineering
Electrical Machines and Networks
Electronics and Microelectronics
Engineering
Instrumentation
Original Article
Power Electronics
title Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T12%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sketch%20Classification%20and%20Sketch%20Based%20Image%20Retrieval%20Using%20ViT%20with%20Self-Distillation%20for%20Few%20Samples&rft.jtitle=Journal%20of%20electrical%20engineering%20&%20technology&rft.au=Kang,%20Sungjae&rft.date=2024-09-01&rft.volume=19&rft.issue=7&rft.spage=4587&rft.epage=4593&rft.pages=4587-4593&rft.issn=1975-0102&rft.eissn=2093-7423&rft_id=info:doi/10.1007/s42835-024-01889-6&rft_dat=%3Ccrossref_sprin%3E10_1007_s42835_024_01889_6%3C/crossref_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true