Loading…
What does a platypus look like? Generating customized prompts for zero-shot image classification
Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically...
Saved in:
Published in: | arXiv.org 2023-12 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Pratt, Sarah Covert, Ian Liu, Rosanne Farhadi, Ali |
description | Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2711616998</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2711616998</sourcerecordid><originalsourceid>FETCH-proquest_journals_27116169983</originalsourceid><addsrcrecordid>eNqNjDsOwjAQBS0kJBDkDitRR4odSKCiQHwOgEQJVrIhBsdrvE4BpycFB6B6xcy8kZiqPJfpeqnURCTMjyzLVFGq1Sqfitul1RFqQgYN3ur49j2DJXqCNU_cwhEdBh2Nu0PVc6TOfLAGH6jzkaGhAB8MlHJLEUyn7wiV1cymMdVQkZuLcaMtY_LbmVgc9ufdKR0uXj1yvD6oD25AV1VKWchis1nn_1lfNN9FpQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711616998</pqid></control><display><type>article</type><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><source>Publicly Available Content Database</source><creator>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</creator><creatorcontrib>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</creatorcontrib><description>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Customization ; Explicit knowledge ; Image classification ; Natural language ; Sentences</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2711616998?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Pratt, Sarah</creatorcontrib><creatorcontrib>Covert, Ian</creatorcontrib><creatorcontrib>Liu, Rosanne</creatorcontrib><creatorcontrib>Farhadi, Ali</creatorcontrib><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><title>arXiv.org</title><description>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</description><subject>Classification</subject><subject>Customization</subject><subject>Explicit knowledge</subject><subject>Image classification</subject><subject>Natural language</subject><subject>Sentences</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjDsOwjAQBS0kJBDkDitRR4odSKCiQHwOgEQJVrIhBsdrvE4BpycFB6B6xcy8kZiqPJfpeqnURCTMjyzLVFGq1Sqfitul1RFqQgYN3ur49j2DJXqCNU_cwhEdBh2Nu0PVc6TOfLAGH6jzkaGhAB8MlHJLEUyn7wiV1cymMdVQkZuLcaMtY_LbmVgc9ufdKR0uXj1yvD6oD25AV1VKWchis1nn_1lfNN9FpQ</recordid><startdate>20231203</startdate><enddate>20231203</enddate><creator>Pratt, Sarah</creator><creator>Covert, Ian</creator><creator>Liu, Rosanne</creator><creator>Farhadi, Ali</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231203</creationdate><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><author>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27116169983</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Customization</topic><topic>Explicit knowledge</topic><topic>Image classification</topic><topic>Natural language</topic><topic>Sentences</topic><toplevel>online_resources</toplevel><creatorcontrib>Pratt, Sarah</creatorcontrib><creatorcontrib>Covert, Ian</creatorcontrib><creatorcontrib>Liu, Rosanne</creatorcontrib><creatorcontrib>Farhadi, Ali</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pratt, Sarah</au><au>Covert, Ian</au><au>Liu, Rosanne</au><au>Farhadi, Ali</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>What does a platypus look like? Generating customized prompts for zero-shot image classification</atitle><jtitle>arXiv.org</jtitle><date>2023-12-03</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-12 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2711616998 |
source | Publicly Available Content Database |
subjects | Classification Customization Explicit knowledge Image classification Natural language Sentences |
title | What does a platypus look like? Generating customized prompts for zero-shot image classification |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T13%3A48%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=What%20does%20a%20platypus%20look%20like?%20Generating%20customized%20prompts%20for%20zero-shot%20image%20classification&rft.jtitle=arXiv.org&rft.au=Pratt,%20Sarah&rft.date=2023-12-03&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2711616998%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27116169983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2711616998&rft_id=info:pmid/&rfr_iscdi=true |