Loading…
Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples
Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)...
Saved in:
Published in: | Journal of electrical engineering & technology 2024-09, Vol.19 (7), p.4587-4593 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133 |
container_end_page | 4593 |
container_issue | 7 |
container_start_page | 4587 |
container_title | Journal of electrical engineering & technology |
container_volume | 19 |
creator | Kang, Sungjae Seo, Kisung |
description | Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval. |
doi_str_mv | 10.1007/s42835-024-01889-6 |
format | article |
fullrecord | <record><control><sourceid>crossref_sprin</sourceid><recordid>TN_cdi_crossref_primary_10_1007_s42835_024_01889_6</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1007_s42835_024_01889_6</sourcerecordid><originalsourceid>FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsv4CovEM1tMpOlVqtCQbCt25CmZ6ap6UxJRotvb3S6dnXg_Bd-PoSuGb1hlJa3SfJKFIRySSirKk3UCRpxqgUpJRenaMR0mWVG-Tm6SGlLqWK0ECPUzD-gdxs8CTYlX3tne9-12LZrfFTubYI1ftnZBvAb9NHDlw14mXzb4He_wAffb_AcQk0efOp9CEND3UU8hQOe290-QLpEZ7UNCa6Od4yW08fF5JnMXp9eJncz4rjkPZF5VlkqCSsNtFBKV3k-4_nDpQZQrlqLkharLJQOQBcUrKo0c5apQjAhxogPvS52KUWozT76nY3fhlHzi8oMqExGZf5QGZVDYgilbG4biGbbfcY27_wv9QM5tmtN</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</title><source>Springer Link</source><creator>Kang, Sungjae ; Seo, Kisung</creator><creatorcontrib>Kang, Sungjae ; Seo, Kisung</creatorcontrib><description>Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.</description><identifier>ISSN: 1975-0102</identifier><identifier>EISSN: 2093-7423</identifier><identifier>DOI: 10.1007/s42835-024-01889-6</identifier><language>eng</language><publisher>Singapore: Springer Nature Singapore</publisher><subject>Electrical Engineering ; Electrical Machines and Networks ; Electronics and Microelectronics ; Engineering ; Instrumentation ; Original Article ; Power Electronics</subject><ispartof>Journal of electrical engineering & technology, 2024-09, Vol.19 (7), p.4587-4593</ispartof><rights>The Author(s) under exclusive licence to The Korean Institute of Electrical Engineers 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133</cites><orcidid>0000-0002-5256-0582</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Kang, Sungjae</creatorcontrib><creatorcontrib>Seo, Kisung</creatorcontrib><title>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</title><title>Journal of electrical engineering & technology</title><addtitle>J. Electr. Eng. Technol</addtitle><description>Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.</description><subject>Electrical Engineering</subject><subject>Electrical Machines and Networks</subject><subject>Electronics and Microelectronics</subject><subject>Engineering</subject><subject>Instrumentation</subject><subject>Original Article</subject><subject>Power Electronics</subject><issn>1975-0102</issn><issn>2093-7423</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMtKAzEUhoMoWKsv4CovEM1tMpOlVqtCQbCt25CmZ6ap6UxJRotvb3S6dnXg_Bd-PoSuGb1hlJa3SfJKFIRySSirKk3UCRpxqgUpJRenaMR0mWVG-Tm6SGlLqWK0ECPUzD-gdxs8CTYlX3tne9-12LZrfFTubYI1ftnZBvAb9NHDlw14mXzb4He_wAffb_AcQk0efOp9CEND3UU8hQOe290-QLpEZ7UNCa6Od4yW08fF5JnMXp9eJncz4rjkPZF5VlkqCSsNtFBKV3k-4_nDpQZQrlqLkharLJQOQBcUrKo0c5apQjAhxogPvS52KUWozT76nY3fhlHzi8oMqExGZf5QGZVDYgilbG4biGbbfcY27_wv9QM5tmtN</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Kang, Sungjae</creator><creator>Seo, Kisung</creator><general>Springer Nature Singapore</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-5256-0582</orcidid></search><sort><creationdate>20240901</creationdate><title>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</title><author>Kang, Sungjae ; Seo, Kisung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Electrical Engineering</topic><topic>Electrical Machines and Networks</topic><topic>Electronics and Microelectronics</topic><topic>Engineering</topic><topic>Instrumentation</topic><topic>Original Article</topic><topic>Power Electronics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kang, Sungjae</creatorcontrib><creatorcontrib>Seo, Kisung</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of electrical engineering & technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kang, Sungjae</au><au>Seo, Kisung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples</atitle><jtitle>Journal of electrical engineering & technology</jtitle><stitle>J. Electr. Eng. Technol</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>19</volume><issue>7</issue><spage>4587</spage><epage>4593</epage><pages>4587-4593</pages><issn>1975-0102</issn><eissn>2093-7423</eissn><abstract>Sketch-based image retrieval (SBIR) with Zero-Shot are challenging tasks in computer vision, enabling to retrieve photo images relevant to sketch queries that have not been seen in the training phase. For sketch images without a sequence of information, we propose a modified Vision Transformer (ViT)-based approach that enhances or maintains the performance while reducing the number of sketch training data. First, we add a token for retrieval and integrate auxiliary classifiers of multiple branches ViT network. Second, self-distillation is applied to enable fast transfer learning of sketch domains for our ViT network incorporating addition of classifiers and embedding vectors to each intermediate layers in the network. Third, to address the challenge of overfitting due to reduced input data pairs in training with large datasets, we integrate KL-Divergence, capturing distribution differences between sketches and photos, into the triplet loss, thereby mitigating the impact of limited sketch-photo samples. Experiments on the TU-Berlin and Sketchy dataset demonstrate show that our method performs a significant improvement over other similar methods on sketch classification and sketch-based image retrieval.</abstract><cop>Singapore</cop><pub>Springer Nature Singapore</pub><doi>10.1007/s42835-024-01889-6</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0002-5256-0582</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1975-0102 |
ispartof | Journal of electrical engineering & technology, 2024-09, Vol.19 (7), p.4587-4593 |
issn | 1975-0102 2093-7423 |
language | eng |
recordid | cdi_crossref_primary_10_1007_s42835_024_01889_6 |
source | Springer Link |
subjects | Electrical Engineering Electrical Machines and Networks Electronics and Microelectronics Engineering Instrumentation Original Article Power Electronics |
title | Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T12%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sketch%20Classification%20and%20Sketch%20Based%20Image%20Retrieval%20Using%20ViT%20with%20Self-Distillation%20for%20Few%20Samples&rft.jtitle=Journal%20of%20electrical%20engineering%20&%20technology&rft.au=Kang,%20Sungjae&rft.date=2024-09-01&rft.volume=19&rft.issue=7&rft.spage=4587&rft.epage=4593&rft.pages=4587-4593&rft.issn=1975-0102&rft.eissn=2093-7423&rft_id=info:doi/10.1007/s42835-024-01889-6&rft_dat=%3Ccrossref_sprin%3E10_1007_s42835_024_01889_6%3C/crossref_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c242t-46107764eb9e05669842312764249ee6c8d3705b6987cee950ea6891ca1653133%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |