Loading…

What does a platypus look like? Generating customized prompts for zero-shot image classification

Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-12
Main Authors: Pratt, Sarah, Covert, Ian, Liu, Rosanne, Farhadi, Ali
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Pratt, Sarah
Covert, Ian
Liu, Rosanne
Farhadi, Ali
description Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2711616998</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2711616998</sourcerecordid><originalsourceid>FETCH-proquest_journals_27116169983</originalsourceid><addsrcrecordid>eNqNjDsOwjAQBS0kJBDkDitRR4odSKCiQHwOgEQJVrIhBsdrvE4BpycFB6B6xcy8kZiqPJfpeqnURCTMjyzLVFGq1Sqfitul1RFqQgYN3ur49j2DJXqCNU_cwhEdBh2Nu0PVc6TOfLAGH6jzkaGhAB8MlHJLEUyn7wiV1cymMdVQkZuLcaMtY_LbmVgc9ufdKR0uXj1yvD6oD25AV1VKWchis1nn_1lfNN9FpQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711616998</pqid></control><display><type>article</type><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><source>Publicly Available Content Database</source><creator>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</creator><creatorcontrib>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</creatorcontrib><description>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Customization ; Explicit knowledge ; Image classification ; Natural language ; Sentences</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2711616998?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Pratt, Sarah</creatorcontrib><creatorcontrib>Covert, Ian</creatorcontrib><creatorcontrib>Liu, Rosanne</creatorcontrib><creatorcontrib>Farhadi, Ali</creatorcontrib><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><title>arXiv.org</title><description>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</description><subject>Classification</subject><subject>Customization</subject><subject>Explicit knowledge</subject><subject>Image classification</subject><subject>Natural language</subject><subject>Sentences</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjDsOwjAQBS0kJBDkDitRR4odSKCiQHwOgEQJVrIhBsdrvE4BpycFB6B6xcy8kZiqPJfpeqnURCTMjyzLVFGq1Sqfitul1RFqQgYN3ur49j2DJXqCNU_cwhEdBh2Nu0PVc6TOfLAGH6jzkaGhAB8MlHJLEUyn7wiV1cymMdVQkZuLcaMtY_LbmVgc9ufdKR0uXj1yvD6oD25AV1VKWchis1nn_1lfNN9FpQ</recordid><startdate>20231203</startdate><enddate>20231203</enddate><creator>Pratt, Sarah</creator><creator>Covert, Ian</creator><creator>Liu, Rosanne</creator><creator>Farhadi, Ali</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231203</creationdate><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><author>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27116169983</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Customization</topic><topic>Explicit knowledge</topic><topic>Image classification</topic><topic>Natural language</topic><topic>Sentences</topic><toplevel>online_resources</toplevel><creatorcontrib>Pratt, Sarah</creatorcontrib><creatorcontrib>Covert, Ian</creatorcontrib><creatorcontrib>Liu, Rosanne</creatorcontrib><creatorcontrib>Farhadi, Ali</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pratt, Sarah</au><au>Covert, Ian</au><au>Liu, Rosanne</au><au>Farhadi, Ali</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>What does a platypus look like? Generating customized prompts for zero-shot image classification</atitle><jtitle>arXiv.org</jtitle><date>2023-12-03</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_2711616998
source Publicly Available Content Database
subjects Classification
Customization
Explicit knowledge
Image classification
Natural language
Sentences
title What does a platypus look like? Generating customized prompts for zero-shot image classification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T13%3A48%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=What%20does%20a%20platypus%20look%20like?%20Generating%20customized%20prompts%20for%20zero-shot%20image%20classification&rft.jtitle=arXiv.org&rft.au=Pratt,%20Sarah&rft.date=2023-12-03&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2711616998%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27116169983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2711616998&rft_id=info:pmid/&rfr_iscdi=true