Loading…

What does a platypus look like? Generating customized prompts for zero-shot image classification

Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-12
Main Authors:	Pratt, Sarah, Covert, Ian, Liu, Rosanne, Farhadi, Ali
Format:	Article
Language:	English
Subjects:	Classification Customization Explicit knowledge Image classification Natural language Sentences
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Pratt, Sarah Covert, Ian Liu, Rosanne Farhadi, Ali
description	Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2711616998</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2711616998</sourcerecordid><originalsourceid>FETCH-proquest_journals_27116169983</originalsourceid><addsrcrecordid>eNqNjDsOwjAQBS0kJBDkDitRR4odSKCiQHwOgEQJVrIhBsdrvE4BpycFB6B6xcy8kZiqPJfpeqnURCTMjyzLVFGq1Sqfitul1RFqQgYN3ur49j2DJXqCNU_cwhEdBh2Nu0PVc6TOfLAGH6jzkaGhAB8MlHJLEUyn7wiV1cymMdVQkZuLcaMtY_LbmVgc9ufdKR0uXj1yvD6oD25AV1VKWchis1nn_1lfNN9FpQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711616998</pqid></control><display><type>article</type><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><source>Publicly Available Content Database</source><creator>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</creator><creatorcontrib>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</creatorcontrib><description>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Customization ; Explicit knowledge ; Image classification ; Natural language ; Sentences</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2711616998?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Pratt, Sarah</creatorcontrib><creatorcontrib>Covert, Ian</creatorcontrib><creatorcontrib>Liu, Rosanne</creatorcontrib><creatorcontrib>Farhadi, Ali</creatorcontrib><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><title>arXiv.org</title><description>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</description><subject>Classification</subject><subject>Customization</subject><subject>Explicit knowledge</subject><subject>Image classification</subject><subject>Natural language</subject><subject>Sentences</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjDsOwjAQBS0kJBDkDitRR4odSKCiQHwOgEQJVrIhBsdrvE4BpycFB6B6xcy8kZiqPJfpeqnURCTMjyzLVFGq1Sqfitul1RFqQgYN3ur49j2DJXqCNU_cwhEdBh2Nu0PVc6TOfLAGH6jzkaGhAB8MlHJLEUyn7wiV1cymMdVQkZuLcaMtY_LbmVgc9ufdKR0uXj1yvD6oD25AV1VKWchis1nn_1lfNN9FpQ</recordid><startdate>20231203</startdate><enddate>20231203</enddate><creator>Pratt, Sarah</creator><creator>Covert, Ian</creator><creator>Liu, Rosanne</creator><creator>Farhadi, Ali</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231203</creationdate><title>What does a platypus look like? Generating customized prompts for zero-shot image classification</title><author>Pratt, Sarah ; Covert, Ian ; Liu, Rosanne ; Farhadi, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27116169983</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Customization</topic><topic>Explicit knowledge</topic><topic>Image classification</topic><topic>Natural language</topic><topic>Sentences</topic><toplevel>online_resources</toplevel><creatorcontrib>Pratt, Sarah</creatorcontrib><creatorcontrib>Covert, Ian</creatorcontrib><creatorcontrib>Liu, Rosanne</creatorcontrib><creatorcontrib>Farhadi, Ali</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pratt, Sarah</au><au>Covert, Ian</au><au>Liu, Rosanne</au><au>Farhadi, Ali</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>What does a platypus look like? Generating customized prompts for zero-shot image classification</atitle><jtitle>arXiv.org</jtitle><date>2023-12-03</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2711616998
source	Publicly Available Content Database
subjects	Classification Customization Explicit knowledge Image classification Natural language Sentences
title	What does a platypus look like? Generating customized prompts for zero-shot image classification
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T13%3A48%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=What%20does%20a%20platypus%20look%20like?%20Generating%20customized%20prompts%20for%20zero-shot%20image%20classification&rft.jtitle=arXiv.org&rft.au=Pratt,%20Sarah&rft.date=2023-12-03&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2711616998%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27116169983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2711616998&rft_id=info:pmid/&rfr_iscdi=true