Loading…
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 7863 |
container_issue | |
container_start_page | 7859 |
container_title | |
container_volume | |
creator | Zhang, Zhuoyang Cai, Han Han, Song |
description | We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit. |
doi_str_mv | 10.1109/CVPRW63382.2024.00782 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10677874</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10677874</ieee_id><sourcerecordid>10677874</sourcerecordid><originalsourceid>FETCH-ieee_primary_106778743</originalsourceid><addsrcrecordid>eNqFycuKwjAUANA4IChj_0AhP9B6kzSPuiuiuLAgKrqUUm810ockceHf62L2szqLQ8iMQcIYZPPlabc_KyEMTzjwNAHQhg9IlOnMCAlCyVSnP2TMmYJYS6ZGJPL-AQAMjJSZGJNiVde2stiFkz3Gh7xY0LyqsEFXBrzSA97a79G8e4e77W606K_Y0LMN9_4V6A5d3bu27Cqk2977CRnWZeMx-vOXTNer43ITW0S8PJ1tS_e-MFBaG52Kf_oDTlpA9w</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Zhuoyang ; Cai, Han ; Han, Song</creator><creatorcontrib>Zhang, Zhuoyang ; Cai, Han ; Han, Song</creatorcontrib><description>We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.</description><identifier>EISSN: 2160-7516</identifier><identifier>EISBN: 9798350365474</identifier><identifier>DOI: 10.1109/CVPRW63382.2024.00782</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Codes ; Computational modeling ; Computer vision ; Conferences ; Graphics processing units ; Image segmentation ; Training</subject><ispartof>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024, p.7859-7863</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10677874$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27924,54554,54931</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10677874$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Zhuoyang</creatorcontrib><creatorcontrib>Cai, Han</creatorcontrib><creatorcontrib>Han, Song</creatorcontrib><title>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</title><title>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</title><addtitle>CVPRW</addtitle><description>We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.</description><subject>Codes</subject><subject>Computational modeling</subject><subject>Computer vision</subject><subject>Conferences</subject><subject>Graphics processing units</subject><subject>Image segmentation</subject><subject>Training</subject><issn>2160-7516</issn><isbn>9798350365474</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqFycuKwjAUANA4IChj_0AhP9B6kzSPuiuiuLAgKrqUUm810ockceHf62L2szqLQ8iMQcIYZPPlabc_KyEMTzjwNAHQhg9IlOnMCAlCyVSnP2TMmYJYS6ZGJPL-AQAMjJSZGJNiVde2stiFkz3Gh7xY0LyqsEFXBrzSA97a79G8e4e77W606K_Y0LMN9_4V6A5d3bu27Cqk2977CRnWZeMx-vOXTNer43ITW0S8PJ1tS_e-MFBaG52Kf_oDTlpA9w</recordid><startdate>20240617</startdate><enddate>20240617</enddate><creator>Zhang, Zhuoyang</creator><creator>Cai, Han</creator><creator>Han, Song</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240617</creationdate><title>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</title><author>Zhang, Zhuoyang ; Cai, Han ; Han, Song</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_106778743</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Codes</topic><topic>Computational modeling</topic><topic>Computer vision</topic><topic>Conferences</topic><topic>Graphics processing units</topic><topic>Image segmentation</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhuoyang</creatorcontrib><creatorcontrib>Cai, Han</creatorcontrib><creatorcontrib>Han, Song</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Zhuoyang</au><au>Cai, Han</au><au>Han, Song</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss</atitle><btitle>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</btitle><stitle>CVPRW</stitle><date>2024-06-17</date><risdate>2024</risdate><spage>7859</spage><epage>7863</epage><pages>7859-7863</pages><eissn>2160-7516</eissn><eisbn>9798350365474</eisbn><coden>IEEPAD</coden><abstract>We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9× measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.</abstract><pub>IEEE</pub><doi>10.1109/CVPRW63382.2024.00782</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2160-7516 |
ispartof | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024, p.7859-7863 |
issn | 2160-7516 |
language | eng |
recordid | cdi_ieee_primary_10677874 |
source | IEEE Xplore All Conference Series |
subjects | Codes Computational modeling Computer vision Conferences Graphics processing units Image segmentation Training |
title | EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T19%3A48%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=EfficientViT-SAM:%20Accelerated%20Segment%20Anything%20Model%20Without%20Performance%20Loss&rft.btitle=2024%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20Workshops%20(CVPRW)&rft.au=Zhang,%20Zhuoyang&rft.date=2024-06-17&rft.spage=7859&rft.epage=7863&rft.pages=7859-7863&rft.eissn=2160-7516&rft.coden=IEEPAD&rft_id=info:doi/10.1109/CVPRW63382.2024.00782&rft.eisbn=9798350365474&rft_dat=%3Cieee_CHZPO%3E10677874%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-ieee_primary_106778743%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10677874&rfr_iscdi=true |