Loading…
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unc...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433 |
---|---|
cites | |
container_end_page | 299 |
container_issue | |
container_start_page | 289 |
container_title | |
container_volume | |
creator | Liu, Xihui Park, Dong Huk Azadi, Samaneh Zhang, Gong Chopikyan, Arman Hu, Yuxiao Shi, Humphrey Rohrbach, Anna Darrell, Trevor |
description | Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1 |
doi_str_mv | 10.1109/WACV56688.2023.00037 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10030365</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10030365</ieee_id><sourcerecordid>10030365</sourcerecordid><originalsourceid>FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</originalsourceid><addsrcrecordid>eNotj91KwzAYQKMguE3fYBfxAVrzJf3S5nJUVwcTL-bP5cjSLy6ytpJ0yN7egV6dm8OBw9gcRA4gzP3Hon5Hrasql0KqXAihygs2Ba2xMKrQcMkmUhcyM6qCazZN6eusGDBqwprnIRKvh36Mw4H7IfJlJLrjq85-Et-c-nFPKST-E8Y931Bn-zE4_hC8P6Yw9Lw5htb2jm7YlbeHRLf_nLG35eNr_ZStX5pVvVhnTiKOmTUkdqSt8drpFh1KEDsP1hpLLRTSKd_aEj2iLosSUUpSrQBwQLKEQqkZm_91AxFtv2PobDxt4fwjlEb1CznMS5w</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><source>IEEE Xplore All Conference Series</source><creator>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</creator><creatorcontrib>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</creatorcontrib><description>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</description><identifier>EISSN: 2642-9381</identifier><identifier>EISBN: 1665493461</identifier><identifier>EISBN: 9781665493468</identifier><identifier>DOI: 10.1109/WACV56688.2023.00037</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithms: Computational photography ; Annotations ; Computer vision ; image and video synthesis ; Image matching ; Image synthesis ; Noise reduction ; Probabilistic logic ; Semantics ; Vision + language and/or other modalities</subject><ispartof>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, p.289-299</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10030365$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10030365$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Xihui</creatorcontrib><creatorcontrib>Park, Dong Huk</creatorcontrib><creatorcontrib>Azadi, Samaneh</creatorcontrib><creatorcontrib>Zhang, Gong</creatorcontrib><creatorcontrib>Chopikyan, Arman</creatorcontrib><creatorcontrib>Hu, Yuxiao</creatorcontrib><creatorcontrib>Shi, Humphrey</creatorcontrib><creatorcontrib>Rohrbach, Anna</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><title>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</title><addtitle>WACV</addtitle><description>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</description><subject>Algorithms: Computational photography</subject><subject>Annotations</subject><subject>Computer vision</subject><subject>image and video synthesis</subject><subject>Image matching</subject><subject>Image synthesis</subject><subject>Noise reduction</subject><subject>Probabilistic logic</subject><subject>Semantics</subject><subject>Vision + language and/or other modalities</subject><issn>2642-9381</issn><isbn>1665493461</isbn><isbn>9781665493468</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj91KwzAYQKMguE3fYBfxAVrzJf3S5nJUVwcTL-bP5cjSLy6ytpJ0yN7egV6dm8OBw9gcRA4gzP3Hon5Hrasql0KqXAihygs2Ba2xMKrQcMkmUhcyM6qCazZN6eusGDBqwprnIRKvh36Mw4H7IfJlJLrjq85-Et-c-nFPKST-E8Y931Bn-zE4_hC8P6Yw9Lw5htb2jm7YlbeHRLf_nLG35eNr_ZStX5pVvVhnTiKOmTUkdqSt8drpFh1KEDsP1hpLLRTSKd_aEj2iLosSUUpSrQBwQLKEQqkZm_91AxFtv2PobDxt4fwjlEb1CznMS5w</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Liu, Xihui</creator><creator>Park, Dong Huk</creator><creator>Azadi, Samaneh</creator><creator>Zhang, Gong</creator><creator>Chopikyan, Arman</creator><creator>Hu, Yuxiao</creator><creator>Shi, Humphrey</creator><creator>Rohrbach, Anna</creator><creator>Darrell, Trevor</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>202301</creationdate><title>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</title><author>Liu, Xihui ; Park, Dong Huk ; Azadi, Samaneh ; Zhang, Gong ; Chopikyan, Arman ; Hu, Yuxiao ; Shi, Humphrey ; Rohrbach, Anna ; Darrell, Trevor</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms: Computational photography</topic><topic>Annotations</topic><topic>Computer vision</topic><topic>image and video synthesis</topic><topic>Image matching</topic><topic>Image synthesis</topic><topic>Noise reduction</topic><topic>Probabilistic logic</topic><topic>Semantics</topic><topic>Vision + language and/or other modalities</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Xihui</creatorcontrib><creatorcontrib>Park, Dong Huk</creatorcontrib><creatorcontrib>Azadi, Samaneh</creatorcontrib><creatorcontrib>Zhang, Gong</creatorcontrib><creatorcontrib>Chopikyan, Arman</creatorcontrib><creatorcontrib>Hu, Yuxiao</creatorcontrib><creatorcontrib>Shi, Humphrey</creatorcontrib><creatorcontrib>Rohrbach, Anna</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Xihui</au><au>Park, Dong Huk</au><au>Azadi, Samaneh</au><au>Zhang, Gong</au><au>Chopikyan, Arman</au><au>Hu, Yuxiao</au><au>Shi, Humphrey</au><au>Rohrbach, Anna</au><au>Darrell, Trevor</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>More Control for Free! Image Synthesis with Semantic Diffusion Guidance</atitle><btitle>2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</btitle><stitle>WACV</stitle><date>2023-01</date><risdate>2023</risdate><spage>289</spage><epage>299</epage><pages>289-299</pages><eissn>2642-9381</eissn><eisbn>1665493461</eisbn><eisbn>9781665493468</eisbn><coden>IEEPAD</coden><abstract>Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores, without re-training the diffusion model. We explore CLIP-based language guidance as well as both content and style-based image guidance in a unified framework. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis, synthesis of images related to a style or content reference image, and examples with both textual and image guidance. 1</abstract><pub>IEEE</pub><doi>10.1109/WACV56688.2023.00037</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2642-9381 |
ispartof | 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, p.289-299 |
issn | 2642-9381 |
language | eng |
recordid | cdi_ieee_primary_10030365 |
source | IEEE Xplore All Conference Series |
subjects | Algorithms: Computational photography Annotations Computer vision image and video synthesis Image matching Image synthesis Noise reduction Probabilistic logic Semantics Vision + language and/or other modalities |
title | More Control for Free! Image Synthesis with Semantic Diffusion Guidance |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T09%3A51%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=More%20Control%20for%20Free!%20Image%20Synthesis%20with%20Semantic%20Diffusion%20Guidance&rft.btitle=2023%20IEEE/CVF%20Winter%20Conference%20on%20Applications%20of%20Computer%20Vision%20(WACV)&rft.au=Liu,%20Xihui&rft.date=2023-01&rft.spage=289&rft.epage=299&rft.pages=289-299&rft.eissn=2642-9381&rft.coden=IEEPAD&rft_id=info:doi/10.1109/WACV56688.2023.00037&rft.eisbn=1665493461&rft.eisbn_list=9781665493468&rft_dat=%3Cieee_CHZPO%3E10030365%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c255t-a9e0be6a9f6c6d5c5210bf1aa9aed142c3fda75f5567475522e3d011c1e271433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10030365&rfr_iscdi=true |