Loading…

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unc...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu, Xihui, Park, Dong Huk, Azadi, Samaneh, Zhang, Gong, Chopikyan, Arman, Hu, Yuxiao, Shi, Humphrey, Rohrbach, Anna, Darrell, Trevor
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433
cites
container_end_page 299
container_issue
container_start_page 289
container_title
container_volume
creator Liu, Xihui
Park, Dong Huk
Azadi, Samaneh
Zhang, Gong
Chopikyan, Arman
Hu, Yuxiao
Shi, Humphrey
Rohrbach, Anna
Darrell, Trevor
description Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1
doi_str_mv 10.1109/WACV56688.2023.00037
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10030365</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10030365</ieee_id><sourcerecordid>10030365</sourcerecordid><originalsourceid>FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</originalsourceid><addsrcrecordid>eNotj91KwzAYQKMguE3fYBfxAVrzJf3S5nJUVwcTL-bP5cjSLy6ytpJ0yN7egV6dm8OBw9gcRA4gzP3Hon5Hrasql0KqXAihygs2Ba2xMKrQcMkmUhcyM6qCazZN6eusGDBqwprnIRKvh36Mw4H7IfJlJLrjq85-Et-c-nFPKST-E8Y931Bn-zE4_hC8P6Yw9Lw5htb2jm7YlbeHRLf_nLG35eNr_ZStX5pVvVhnTiKOmTUkdqSt8drpFh1KEDsP1hpLLRTSKd_aEj2iLosSUUpSrQBwQLKEQqkZm_91AxFtv2PobDxt4fwjlEb1CznMS5w</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><source>IEEE Xplore All Conference Series</source><creator>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</creator><creatorcontrib>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</creatorcontrib><description>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</description><identifier>EISSN: 2642-9381</identifier><identifier>EISBN: 1665493461</identifier><identifier>EISBN: 9781665493468</identifier><identifier>DOI: 10.1109/WACV56688.2023.00037</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithms: Computational photography ; Annotations ; Computer vision ; image and video synthesis ; Image matching ; Image synthesis ; Noise reduction ; Probabilistic logic ; Semantics ; Vision + language and/or other modalities</subject><ispartof>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, p.289-299</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10030365$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10030365$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Xihui</creatorcontrib><creatorcontrib>Park, Dong Huk</creatorcontrib><creatorcontrib>Azadi, Samaneh</creatorcontrib><creatorcontrib>Zhang, Gong</creatorcontrib><creatorcontrib>Chopikyan, Arman</creatorcontrib><creatorcontrib>Hu, Yuxiao</creatorcontrib><creatorcontrib>Shi, Humphrey</creatorcontrib><creatorcontrib>Rohrbach, Anna</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><title>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</title><addtitle>WACV</addtitle><description>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</description><subject>Algorithms: Computational photography</subject><subject>Annotations</subject><subject>Computer vision</subject><subject>image and video synthesis</subject><subject>Image matching</subject><subject>Image synthesis</subject><subject>Noise reduction</subject><subject>Probabilistic logic</subject><subject>Semantics</subject><subject>Vision + language and/or other modalities</subject><issn>2642-9381</issn><isbn>1665493461</isbn><isbn>9781665493468</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj91KwzAYQKMguE3fYBfxAVrzJf3S5nJUVwcTL-bP5cjSLy6ytpJ0yN7egV6dm8OBw9gcRA4gzP3Hon5Hrasql0KqXAihygs2Ba2xMKrQcMkmUhcyM6qCazZN6eusGDBqwprnIRKvh36Mw4H7IfJlJLrjq85-Et-c-nFPKST-E8Y931Bn-zE4_hC8P6Yw9Lw5htb2jm7YlbeHRLf_nLG35eNr_ZStX5pVvVhnTiKOmTUkdqSt8drpFh1KEDsP1hpLLRTSKd_aEj2iLosSUUpSrQBwQLKEQqkZm_91AxFtv2PobDxt4fwjlEb1CznMS5w</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Liu, Xihui</creator><creator>Park, Dong Huk</creator><creator>Azadi, Samaneh</creator><creator>Zhang, Gong</creator><creator>Chopikyan, Arman</creator><creator>Hu, Yuxiao</creator><creator>Shi, Humphrey</creator><creator>Rohrbach, Anna</creator><creator>Darrell, Trevor</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>202301</creationdate><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><author>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms: Computational photography</topic><topic>Annotations</topic><topic>Computer vision</topic><topic>image and video synthesis</topic><topic>Image matching</topic><topic>Image synthesis</topic><topic>Noise reduction</topic><topic>Probabilistic logic</topic><topic>Semantics</topic><topic>Vision + language and/or other modalities</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Xihui</creatorcontrib><creatorcontrib>Park, Dong Huk</creatorcontrib><creatorcontrib>Azadi, Samaneh</creatorcontrib><creatorcontrib>Zhang, Gong</creatorcontrib><creatorcontrib>Chopikyan, Arman</creatorcontrib><creatorcontrib>Hu, Yuxiao</creatorcontrib><creatorcontrib>Shi, Humphrey</creatorcontrib><creatorcontrib>Rohrbach, Anna</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Xihui</au><au>Park, Dong Huk</au><au>Azadi, Samaneh</au><au>Zhang, Gong</au><au>Chopikyan, Arman</au><au>Hu, Yuxiao</au><au>Shi, Humphrey</au><au>Rohrbach, Anna</au><au>Darrell, Trevor</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</atitle><btitle>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</btitle><stitle>WACV</stitle><date>2023-01</date><risdate>2023</risdate><spage>289</spage><epage>299</epage><pages>289-299</pages><eissn>2642-9381</eissn><eisbn>1665493461</eisbn><eisbn>9781665493468</eisbn><coden>IEEPAD</coden><abstract>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</abstract><pub>IEEE</pub><doi>10.1109/WACV56688.2023.00037</doi><tpages>11</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2642-9381
ispartof 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, p.289-299
issn 2642-9381
language eng
recordid cdi_ieee_primary_10030365
source IEEE Xplore All Conference Series
subjects Algorithms: Computational photography
Annotations
Computer vision
image and video synthesis
Image matching
Image synthesis
Noise reduction
Probabilistic logic
Semantics
Vision + language and/or other modalities
title More Control for Free! Image Synthesis with Semantic Diffusion Guidance
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T09%3A51%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=More%20Control%20for%20Free!%20Image%20Synthesis%20with%20Semantic%20Diffusion%20Guidance&rft.btitle=2023%20IEEE/CVF%20Winter%20Conference%20on%20Applications%20of%20Computer%20Vision%20(WACV)&rft.au=Liu,%20Xihui&rft.date=2023-01&rft.spage=289&rft.epage=299&rft.pages=289-299&rft.eissn=2642-9381&rft.coden=IEEPAD&rft_id=info:doi/10.1109/WACV56688.2023.00037&rft.eisbn=1665493461&rft.eisbn_list=9781665493468&rft_dat=%3Cieee_CHZPO%3E10030365%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10030365&rfr_iscdi=true