Loading…

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Liu, Xihui, Park, Dong Huk, Azadi, Samaneh, Zhang, Gong, Chopikyan, Arman, Hu, Yuxiao, Shi, Humphrey, Rohrbach, Anna, Darrell, Trevor
Format:	Conference Proceeding
Language:	English
Subjects:	Algorithms: Computational photography Annotations Computer vision image and video synthesis Image matching Image synthesis Noise reduction Probabilistic logic Semantics Vision + language and/or other modalities
Citations:	Items that cite this one
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433
cites
container_end_page	299
container_issue
container_start_page	289
container_title
container_volume
creator	Liu, Xihui Park, Dong Huk Azadi, Samaneh Zhang, Gong Chopikyan, Arman Hu, Yuxiao Shi, Humphrey Rohrbach, Anna Darrell, Trevor
description	Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1
doi_str_mv	10.1109/WACV56688.2023.00037
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10030365</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10030365</ieee_id><sourcerecordid>10030365</sourcerecordid><originalsourceid>FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</originalsourceid><addsrcrecordid>eNotj91KwzAYQKMguE3fYBfxAVrzJf3S5nJUVwcTL-bP5cjSLy6ytpJ0yN7egV6dm8OBw9gcRA4gzP3Hon5Hrasql0KqXAihygs2Ba2xMKrQcMkmUhcyM6qCazZN6eusGDBqwprnIRKvh36Mw4H7IfJlJLrjq85-Et-c-nFPKST-E8Y931Bn-zE4_hC8P6Yw9Lw5htb2jm7YlbeHRLf_nLG35eNr_ZStX5pVvVhnTiKOmTUkdqSt8drpFh1KEDsP1hpLLRTSKd_aEj2iLosSUUpSrQBwQLKEQqkZm_91AxFtv2PobDxt4fwjlEb1CznMS5w</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><source>IEEE Xplore All Conference Series</source><creator>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</creator><creatorcontrib>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</creatorcontrib><description>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</description><identifier>EISSN: 2642-9381</identifier><identifier>EISBN: 1665493461</identifier><identifier>EISBN: 9781665493468</identifier><identifier>DOI: 10.1109/WACV56688.2023.00037</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithms: Computational photography ; Annotations ; Computer vision ; image and video synthesis ; Image matching ; Image synthesis ; Noise reduction ; Probabilistic logic ; Semantics ; Vision + language and/or other modalities</subject><ispartof>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, p.289-299</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10030365$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10030365$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Xihui</creatorcontrib><creatorcontrib>Park, Dong Huk</creatorcontrib><creatorcontrib>Azadi, Samaneh</creatorcontrib><creatorcontrib>Zhang, Gong</creatorcontrib><creatorcontrib>Chopikyan, Arman</creatorcontrib><creatorcontrib>Hu, Yuxiao</creatorcontrib><creatorcontrib>Shi, Humphrey</creatorcontrib><creatorcontrib>Rohrbach, Anna</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><title>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</title><addtitle>WACV</addtitle><description>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</description><subject>Algorithms: Computational photography</subject><subject>Annotations</subject><subject>Computer vision</subject><subject>image and video synthesis</subject><subject>Image matching</subject><subject>Image synthesis</subject><subject>Noise reduction</subject><subject>Probabilistic logic</subject><subject>Semantics</subject><subject>Vision + language and/or other modalities</subject><issn>2642-9381</issn><isbn>1665493461</isbn><isbn>9781665493468</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj91KwzAYQKMguE3fYBfxAVrzJf3S5nJUVwcTL-bP5cjSLy6ytpJ0yN7egV6dm8OBw9gcRA4gzP3Hon5Hrasql0KqXAihygs2Ba2xMKrQcMkmUhcyM6qCazZN6eusGDBqwprnIRKvh36Mw4H7IfJlJLrjq85-Et-c-nFPKST-E8Y931Bn-zE4_hC8P6Yw9Lw5htb2jm7YlbeHRLf_nLG35eNr_ZStX5pVvVhnTiKOmTUkdqSt8drpFh1KEDsP1hpLLRTSKd_aEj2iLosSUUpSrQBwQLKEQqkZm_91AxFtv2PobDxt4fwjlEb1CznMS5w</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Liu, Xihui</creator><creator>Park, Dong Huk</creator><creator>Azadi, Samaneh</creator><creator>Zhang, Gong</creator><creator>Chopikyan, Arman</creator><creator>Hu, Yuxiao</creator><creator>Shi, Humphrey</creator><creator>Rohrbach, Anna</creator><creator>Darrell, Trevor</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>202301</creationdate><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><author>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms: Computational photography</topic><topic>Annotations</topic><topic>Computer vision</topic><topic>image and video synthesis</topic><topic>Image matching</topic><topic>Image synthesis</topic><topic>Noise reduction</topic><topic>Probabilistic logic</topic><topic>Semantics</topic><topic>Vision + language and/or other modalities</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Xihui</creatorcontrib><creatorcontrib>Park, Dong Huk</creatorcontrib><creatorcontrib>Azadi, Samaneh</creatorcontrib><creatorcontrib>Zhang, Gong</creatorcontrib><creatorcontrib>Chopikyan, Arman</creatorcontrib><creatorcontrib>Hu, Yuxiao</creatorcontrib><creatorcontrib>Shi, Humphrey</creatorcontrib><creatorcontrib>Rohrbach, Anna</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Xihui</au><au>Park, Dong Huk</au><au>Azadi, Samaneh</au><au>Zhang, Gong</au><au>Chopikyan, Arman</au><au>Hu, Yuxiao</au><au>Shi, Humphrey</au><au>Rohrbach, Anna</au><au>Darrell, Trevor</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</atitle><btitle>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</btitle><stitle>WACV</stitle><date>2023-01</date><risdate>2023</risdate><spage>289</spage><epage>299</epage><pages>289-299</pages><eissn>2642-9381</eissn><eisbn>1665493461</eisbn><eisbn>9781665493468</eisbn><coden>IEEPAD</coden><abstract>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</abstract><pub>IEEE</pub><doi>10.1109/WACV56688.2023.00037</doi><tpages>11</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2642-9381
ispartof	2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, p.289-299
issn	2642-9381
language	eng
recordid	cdi_ieee_primary_10030365
source	IEEE Xplore All Conference Series
subjects	Algorithms: Computational photography Annotations Computer vision image and video synthesis Image matching Image synthesis Noise reduction Probabilistic logic Semantics Vision + language and/or other modalities
title	More Control for Free! Image Synthesis with Semantic Diffusion Guidance
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T09%3A51%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=More%20Control%20for%20Free!%20Image%20Synthesis%20with%20Semantic%20Diffusion%20Guidance&rft.btitle=2023%20IEEE/CVF%20Winter%20Conference%20on%20Applications%20of%20Computer%20Vision%20(WACV)&rft.au=Liu,%20Xihui&rft.date=2023-01&rft.spage=289&rft.epage=299&rft.pages=289-299&rft.eissn=2642-9381&rft.coden=IEEPAD&rft_id=info:doi/10.1109/WACV56688.2023.00037&rft.eisbn=1665493461&rft.eisbn_list=9781665493468&rft_dat=%3Cieee_CHZPO%3E10030365%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10030365&rfr_iscdi=true