Loading…

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang, Zhuoyang, Cai, Han, Han, Song
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 7863
container_issue
container_start_page 7859
container_title
container_volume
creator Zhang, Zhuoyang
Cai, Han
Han, Song
description We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.
doi_str_mv 10.1109/CVPRW63382.2024.00782
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10677874</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10677874</ieee_id><sourcerecordid>10677874</sourcerecordid><originalsourceid>FETCH-ieee_primary_106778743</originalsourceid><addsrcrecordid>eNqFycuKwjAUANA4IChj_0AhP9B6kzSPuiuiuLAgKrqUUm810ockceHf62L2szqLQ8iMQcIYZPPlabc_KyEMTzjwNAHQhg9IlOnMCAlCyVSnP2TMmYJYS6ZGJPL-AQAMjJSZGJNiVde2stiFkz3Gh7xY0LyqsEFXBrzSA97a79G8e4e77W606K_Y0LMN9_4V6A5d3bu27Cqk2977CRnWZeMx-vOXTNer43ITW0S8PJ1tS_e-MFBaG52Kf_oDTlpA9w</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Zhuoyang ; Cai, Han ; Han, Song</creator><creatorcontrib>Zhang, Zhuoyang ; Cai, Han ; Han, Song</creatorcontrib><description>We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.</description><identifier>EISSN: 2160-7516</identifier><identifier>EISBN: 9798350365474</identifier><identifier>DOI: 10.1109/CVPRW63382.2024.00782</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Codes ; Computational modeling ; Computer vision ; Conferences ; Graphics processing units ; Image segmentation ; Training</subject><ispartof>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024, p.7859-7863</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10677874$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27924,54554,54931</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10677874$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Zhuoyang</creatorcontrib><creatorcontrib>Cai, Han</creatorcontrib><creatorcontrib>Han, Song</creatorcontrib><title>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</title><title>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</title><addtitle>CVPRW</addtitle><description>We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.</description><subject>Codes</subject><subject>Computational modeling</subject><subject>Computer vision</subject><subject>Conferences</subject><subject>Graphics processing units</subject><subject>Image segmentation</subject><subject>Training</subject><issn>2160-7516</issn><isbn>9798350365474</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqFycuKwjAUANA4IChj_0AhP9B6kzSPuiuiuLAgKrqUUm810ockceHf62L2szqLQ8iMQcIYZPPlabc_KyEMTzjwNAHQhg9IlOnMCAlCyVSnP2TMmYJYS6ZGJPL-AQAMjJSZGJNiVde2stiFkz3Gh7xY0LyqsEFXBrzSA97a79G8e4e77W606K_Y0LMN9_4V6A5d3bu27Cqk2977CRnWZeMx-vOXTNer43ITW0S8PJ1tS_e-MFBaG52Kf_oDTlpA9w</recordid><startdate>20240617</startdate><enddate>20240617</enddate><creator>Zhang, Zhuoyang</creator><creator>Cai, Han</creator><creator>Han, Song</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240617</creationdate><title>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</title><author>Zhang, Zhuoyang ; Cai, Han ; Han, Song</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_106778743</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Codes</topic><topic>Computational modeling</topic><topic>Computer vision</topic><topic>Conferences</topic><topic>Graphics processing units</topic><topic>Image segmentation</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhuoyang</creatorcontrib><creatorcontrib>Cai, Han</creatorcontrib><creatorcontrib>Han, Song</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Zhuoyang</au><au>Cai, Han</au><au>Han, Song</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</atitle><btitle>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</btitle><stitle>CVPRW</stitle><date>2024-06-17</date><risdate>2024</risdate><spage>7859</spage><epage>7863</epage><pages>7859-7863</pages><eissn>2160-7516</eissn><eisbn>9798350365474</eisbn><coden>IEEPAD</coden><abstract>We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.</abstract><pub>IEEE</pub><doi>10.1109/CVPRW63382.2024.00782</doi></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2160-7516
ispartof 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024, p.7859-7863
issn 2160-7516
language eng
recordid cdi_ieee_primary_10677874
source IEEE Xplore All Conference Series
subjects Codes
Computational modeling
Computer vision
Conferences
Graphics processing units
Image segmentation
Training
title EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T19%3A48%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=EfficientViT-SAM:%20Accelerated%20Segment%20Anything%20Model%20Without%20Performance%20Loss&rft.btitle=2024%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20Workshops%20(CVPRW)&rft.au=Zhang,%20Zhuoyang&rft.date=2024-06-17&rft.spage=7859&rft.epage=7863&rft.pages=7859-7863&rft.eissn=2160-7516&rft.coden=IEEPAD&rft_id=info:doi/10.1109/CVPRW63382.2024.00782&rft.eisbn=9798350365474&rft_dat=%3Cieee_CHZPO%3E10677874%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-ieee_primary_106778743%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10677874&rfr_iscdi=true