Loading…

Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-03
Main Authors: Kuznedelev, Denis, Tabesh, Soroush, Noorbakhsh, Kimia, Frantar, Elias, Beery, Sara, Kurtic, Eldar, Alistarh, Dan
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Kuznedelev, Denis
Tabesh, Soroush
Noorbakhsh, Kimia
Frantar, Elias
Beery, Sara
Kurtic, Eldar
Alistarh, Dan
description Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to make use of modern, highly-accurate models, but are often computationally constrained. To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists? For this, we propose a simple and versatile technique called Few-Shot Task-Aware Compression (TACO). Given a large vision model that is pretrained to be accurate on a broad task, such as classification over ImageNet-22K, TACO produces a smaller model that is accurate on specialized tasks, such as classification across vehicle types or animal species. Crucially, TACO works in few-shot fashion, i.e. only a few task-specific samples are used, and the procedure has low computational overheads. We validate TACO on highly-accurate ResNet, ViT/DeiT, and ConvNeXt models, originally trained on ImageNet, LAION, or iNaturalist, which we specialize and compress to a diverse set of "downstream" subtasks. TACO can reduce the number of non-zero parameters in existing models by up to 20x relative to the original models, leading to inference speedups of up to 3\(\times\), while remaining accuracy-competitive with the uncompressed models on the specialized tasks.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2791774716</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2791774716</sourcerecordid><originalsourceid>FETCH-proquest_journals_27917747163</originalsourceid><addsrcrecordid>eNqNjEELgjAYQEcQJOV_-KDzQDd1dSxRugSB0lWGftJsOXOa1K-voB_Q6R3e482Iwzj36SZgbEFcaxvP81gkWBhyh5zOyirTwtFUqC3EsoU9QlLXqlTYDvoJWYelklq9sIKHkpDiRLOLGSCX9kp3k-wRYnPrerTf04rMa6ktuj8uyTpN8vhAu97cR7RD0Zixbz-qYGLrCxEIP-L_VW_rkD4-</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2791774716</pqid></control><display><type>article</type><title>Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression</title><source>Publicly Available Content Database</source><creator>Kuznedelev, Denis ; Tabesh, Soroush ; Noorbakhsh, Kimia ; Frantar, Elias ; Beery, Sara ; Kurtic, Eldar ; Alistarh, Dan</creator><creatorcontrib>Kuznedelev, Denis ; Tabesh, Soroush ; Noorbakhsh, Kimia ; Frantar, Elias ; Beery, Sara ; Kurtic, Eldar ; Alistarh, Dan</creatorcontrib><description>Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to make use of modern, highly-accurate models, but are often computationally constrained. To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists? For this, we propose a simple and versatile technique called Few-Shot Task-Aware Compression (TACO). Given a large vision model that is pretrained to be accurate on a broad task, such as classification over ImageNet-22K, TACO produces a smaller model that is accurate on specialized tasks, such as classification across vehicle types or animal species. Crucially, TACO works in few-shot fashion, i.e. only a few task-specific samples are used, and the procedure has low computational overheads. We validate TACO on highly-accurate ResNet, ViT/DeiT, and ConvNeXt models, originally trained on ImageNet, LAION, or iNaturalist, which we specialize and compress to a diverse set of "downstream" subtasks. TACO can reduce the number of non-zero parameters in existing models by up to 20x relative to the original models, leading to inference speedups of up to 3\(\times\), while remaining accuracy-competitive with the uncompressed models on the specialized tasks.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Computing costs ; Mathematical models ; Parameters</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2791774716?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Kuznedelev, Denis</creatorcontrib><creatorcontrib>Tabesh, Soroush</creatorcontrib><creatorcontrib>Noorbakhsh, Kimia</creatorcontrib><creatorcontrib>Frantar, Elias</creatorcontrib><creatorcontrib>Beery, Sara</creatorcontrib><creatorcontrib>Kurtic, Eldar</creatorcontrib><creatorcontrib>Alistarh, Dan</creatorcontrib><title>Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression</title><title>arXiv.org</title><description>Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to make use of modern, highly-accurate models, but are often computationally constrained. To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists? For this, we propose a simple and versatile technique called Few-Shot Task-Aware Compression (TACO). Given a large vision model that is pretrained to be accurate on a broad task, such as classification over ImageNet-22K, TACO produces a smaller model that is accurate on specialized tasks, such as classification across vehicle types or animal species. Crucially, TACO works in few-shot fashion, i.e. only a few task-specific samples are used, and the procedure has low computational overheads. We validate TACO on highly-accurate ResNet, ViT/DeiT, and ConvNeXt models, originally trained on ImageNet, LAION, or iNaturalist, which we specialize and compress to a diverse set of "downstream" subtasks. TACO can reduce the number of non-zero parameters in existing models by up to 20x relative to the original models, leading to inference speedups of up to 3\(\times\), while remaining accuracy-competitive with the uncompressed models on the specialized tasks.</description><subject>Classification</subject><subject>Computing costs</subject><subject>Mathematical models</subject><subject>Parameters</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjEELgjAYQEcQJOV_-KDzQDd1dSxRugSB0lWGftJsOXOa1K-voB_Q6R3e482Iwzj36SZgbEFcaxvP81gkWBhyh5zOyirTwtFUqC3EsoU9QlLXqlTYDvoJWYelklq9sIKHkpDiRLOLGSCX9kp3k-wRYnPrerTf04rMa6ktuj8uyTpN8vhAu97cR7RD0Zixbz-qYGLrCxEIP-L_VW_rkD4-</recordid><startdate>20230325</startdate><enddate>20230325</enddate><creator>Kuznedelev, Denis</creator><creator>Tabesh, Soroush</creator><creator>Noorbakhsh, Kimia</creator><creator>Frantar, Elias</creator><creator>Beery, Sara</creator><creator>Kurtic, Eldar</creator><creator>Alistarh, Dan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230325</creationdate><title>Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression</title><author>Kuznedelev, Denis ; Tabesh, Soroush ; Noorbakhsh, Kimia ; Frantar, Elias ; Beery, Sara ; Kurtic, Eldar ; Alistarh, Dan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27917747163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Computing costs</topic><topic>Mathematical models</topic><topic>Parameters</topic><toplevel>online_resources</toplevel><creatorcontrib>Kuznedelev, Denis</creatorcontrib><creatorcontrib>Tabesh, Soroush</creatorcontrib><creatorcontrib>Noorbakhsh, Kimia</creatorcontrib><creatorcontrib>Frantar, Elias</creatorcontrib><creatorcontrib>Beery, Sara</creatorcontrib><creatorcontrib>Kurtic, Eldar</creatorcontrib><creatorcontrib>Alistarh, Dan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kuznedelev, Denis</au><au>Tabesh, Soroush</au><au>Noorbakhsh, Kimia</au><au>Frantar, Elias</au><au>Beery, Sara</au><au>Kurtic, Eldar</au><au>Alistarh, Dan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression</atitle><jtitle>arXiv.org</jtitle><date>2023-03-25</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to make use of modern, highly-accurate models, but are often computationally constrained. To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists? For this, we propose a simple and versatile technique called Few-Shot Task-Aware Compression (TACO). Given a large vision model that is pretrained to be accurate on a broad task, such as classification over ImageNet-22K, TACO produces a smaller model that is accurate on specialized tasks, such as classification across vehicle types or animal species. Crucially, TACO works in few-shot fashion, i.e. only a few task-specific samples are used, and the procedure has low computational overheads. We validate TACO on highly-accurate ResNet, ViT/DeiT, and ConvNeXt models, originally trained on ImageNet, LAION, or iNaturalist, which we specialize and compress to a diverse set of "downstream" subtasks. TACO can reduce the number of non-zero parameters in existing models by up to 20x relative to the original models, leading to inference speedups of up to 3\(\times\), while remaining accuracy-competitive with the uncompressed models on the specialized tasks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2791774716
source Publicly Available Content Database
subjects Classification
Computing costs
Mathematical models
Parameters
title Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T17%3A30%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Vision%20Models%20Can%20Be%20Efficiently%20Specialized%20via%20Few-Shot%20Task-Aware%20Compression&rft.jtitle=arXiv.org&rft.au=Kuznedelev,%20Denis&rft.date=2023-03-25&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2791774716%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27917747163%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2791774716&rft_id=info:pmid/&rfr_iscdi=true