Loading…

Towards Text-guided 3D Scene Composition

We are witnessing significant breakthroughs in the tech-nology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhang, Qihang, Wang, Chaoyang, Siarohin, Aliaksandr, Zhuang, Peiye, Xu, Yinghao, Yang, Ceyuan, Lin, Dahua, Zhou, Bolei, Tulyakov, Sergey, Lee, Hsin-Ying
Format:	Conference Proceeding
Language:	English
Subjects:	3D generation Geometry Hybrid power systems Layout Pipelines Scene generation Solid modeling Text to image Three-dimensional displays
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	6838
container_issue
container_start_page	6829
container_title
container_volume
creator	Zhang, Qihang Wang, Chaoyang Siarohin, Aliaksandr Zhuang, Peiye Xu, Yinghao Yang, Ceyuan Lin, Dahua Zhou, Bolei Tulyakov, Sergey Lee, Hsin-Ying
description	We are witnessing significant breakthroughs in the tech-nology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce SceneWiz3D - a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation - explicit for objects and implicit for scenes. Remarkably, an object, being represented explicitly, can be either generated from text using conventional text-to-3D approaches, or provided by users. To configure the layout of the scene and automatically place objects, we apply the Particle Swarm Optimization technique during the optimization process. Furthermore, it is difficult for certain parts of the scene (e.g., corners, occlusion) to receive multi-view supervision, leading to inferior geometry. We incor-porate an RGBD panorama diffusion model to mitigate it, resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches, enabling the generation of detailed and view-consistent 3D scenes. Our project website is at https://zqh0253.github.io/SceneWiz3D/.
doi_str_mv	10.1109/CVPR52733.2024.00652
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10654954</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10654954</ieee_id><sourcerecordid>10654954</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-d62d62033b22d20354a34d29757342187c35e1ca55a655d4081d5e4b19f76d1b3</originalsourceid><addsrcrecordid>eNotjMtKxDAUQKMgOIz9g1l06ab1PnKTZin1CQOKVrdD2mQk4kyHtqL-vQWFAwfO4ii1QigRwV3Ur49PQpa5JCBdAhihI5U56yoWYOG5HKsFiZXCgpVTlY3jOwAwIRpXLdR503_5IYx5E7-n4u0zhRhyvsqfu7iPed3vDv2YptTvz9TJ1n-MMfv3Ur3cXDf1XbF-uL2vL9dFQjBTEQzNAHNLFGaL9qwDOSuWNWFlO5aInRfxRiRoqDBI1C26rTUBW16q1d83xRg3hyHt_PCzmd-inWj-BeHvQDw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Towards Text-guided 3D Scene Composition</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Qihang ; Wang, Chaoyang ; Siarohin, Aliaksandr ; Zhuang, Peiye ; Xu, Yinghao ; Yang, Ceyuan ; Lin, Dahua ; Zhou, Bolei ; Tulyakov, Sergey ; Lee, Hsin-Ying</creator><creatorcontrib>Zhang, Qihang ; Wang, Chaoyang ; Siarohin, Aliaksandr ; Zhuang, Peiye ; Xu, Yinghao ; Yang, Ceyuan ; Lin, Dahua ; Zhou, Bolei ; Tulyakov, Sergey ; Lee, Hsin-Ying</creatorcontrib><description>We are witnessing significant breakthroughs in the tech-nology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce SceneWiz3D - a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation - explicit for objects and implicit for scenes. Remarkably, an object, being represented explicitly, can be either generated from text using conventional text-to-3D approaches, or provided by users. To configure the layout of the scene and automatically place objects, we apply the Particle Swarm Optimization technique during the optimization process. Furthermore, it is difficult for certain parts of the scene (e.g., corners, occlusion) to receive multi-view supervision, leading to inferior geometry. We incor-porate an RGBD panorama diffusion model to mitigate it, resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches, enabling the generation of detailed and view-consistent 3D scenes. Our project website is at https://zqh0253.github.io/SceneWiz3D/.</description><identifier>EISSN: 2575-7075</identifier><identifier>EISBN: 9798350353006</identifier><identifier>DOI: 10.1109/CVPR52733.2024.00652</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>3D generation ; Geometry ; Hybrid power systems ; Layout ; Pipelines ; Scene generation ; Solid modeling ; Text to image ; Three-dimensional displays</subject><ispartof>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, p.6829-6838</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10654954$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10654954$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Qihang</creatorcontrib><creatorcontrib>Wang, Chaoyang</creatorcontrib><creatorcontrib>Siarohin, Aliaksandr</creatorcontrib><creatorcontrib>Zhuang, Peiye</creatorcontrib><creatorcontrib>Xu, Yinghao</creatorcontrib><creatorcontrib>Yang, Ceyuan</creatorcontrib><creatorcontrib>Lin, Dahua</creatorcontrib><creatorcontrib>Zhou, Bolei</creatorcontrib><creatorcontrib>Tulyakov, Sergey</creatorcontrib><creatorcontrib>Lee, Hsin-Ying</creatorcontrib><title>Towards Text-guided 3D Scene Composition</title><title>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title><addtitle>CVPR</addtitle><description>We are witnessing significant breakthroughs in the tech-nology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce SceneWiz3D - a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation - explicit for objects and implicit for scenes. Remarkably, an object, being represented explicitly, can be either generated from text using conventional text-to-3D approaches, or provided by users. To configure the layout of the scene and automatically place objects, we apply the Particle Swarm Optimization technique during the optimization process. Furthermore, it is difficult for certain parts of the scene (e.g., corners, occlusion) to receive multi-view supervision, leading to inferior geometry. We incor-porate an RGBD panorama diffusion model to mitigate it, resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches, enabling the generation of detailed and view-consistent 3D scenes. Our project website is at https://zqh0253.github.io/SceneWiz3D/.</description><subject>3D generation</subject><subject>Geometry</subject><subject>Hybrid power systems</subject><subject>Layout</subject><subject>Pipelines</subject><subject>Scene generation</subject><subject>Solid modeling</subject><subject>Text to image</subject><subject>Three-dimensional displays</subject><issn>2575-7075</issn><isbn>9798350353006</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjMtKxDAUQKMgOIz9g1l06ab1PnKTZin1CQOKVrdD2mQk4kyHtqL-vQWFAwfO4ii1QigRwV3Ur49PQpa5JCBdAhihI5U56yoWYOG5HKsFiZXCgpVTlY3jOwAwIRpXLdR503_5IYx5E7-n4u0zhRhyvsqfu7iPed3vDv2YptTvz9TJ1n-MMfv3Ur3cXDf1XbF-uL2vL9dFQjBTEQzNAHNLFGaL9qwDOSuWNWFlO5aInRfxRiRoqDBI1C26rTUBW16q1d83xRg3hyHt_PCzmd-inWj-BeHvQDw</recordid><startdate>20240616</startdate><enddate>20240616</enddate><creator>Zhang, Qihang</creator><creator>Wang, Chaoyang</creator><creator>Siarohin, Aliaksandr</creator><creator>Zhuang, Peiye</creator><creator>Xu, Yinghao</creator><creator>Yang, Ceyuan</creator><creator>Lin, Dahua</creator><creator>Zhou, Bolei</creator><creator>Tulyakov, Sergey</creator><creator>Lee, Hsin-Ying</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20240616</creationdate><title>Towards Text-guided 3D Scene Composition</title><author>Zhang, Qihang ; Wang, Chaoyang ; Siarohin, Aliaksandr ; Zhuang, Peiye ; Xu, Yinghao ; Yang, Ceyuan ; Lin, Dahua ; Zhou, Bolei ; Tulyakov, Sergey ; Lee, Hsin-Ying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-d62d62033b22d20354a34d29757342187c35e1ca55a655d4081d5e4b19f76d1b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D generation</topic><topic>Geometry</topic><topic>Hybrid power systems</topic><topic>Layout</topic><topic>Pipelines</topic><topic>Scene generation</topic><topic>Solid modeling</topic><topic>Text to image</topic><topic>Three-dimensional displays</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Qihang</creatorcontrib><creatorcontrib>Wang, Chaoyang</creatorcontrib><creatorcontrib>Siarohin, Aliaksandr</creatorcontrib><creatorcontrib>Zhuang, Peiye</creatorcontrib><creatorcontrib>Xu, Yinghao</creatorcontrib><creatorcontrib>Yang, Ceyuan</creatorcontrib><creatorcontrib>Lin, Dahua</creatorcontrib><creatorcontrib>Zhou, Bolei</creatorcontrib><creatorcontrib>Tulyakov, Sergey</creatorcontrib><creatorcontrib>Lee, Hsin-Ying</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Qihang</au><au>Wang, Chaoyang</au><au>Siarohin, Aliaksandr</au><au>Zhuang, Peiye</au><au>Xu, Yinghao</au><au>Yang, Ceyuan</au><au>Lin, Dahua</au><au>Zhou, Bolei</au><au>Tulyakov, Sergey</au><au>Lee, Hsin-Ying</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Towards Text-guided 3D Scene Composition</atitle><btitle>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</btitle><stitle>CVPR</stitle><date>2024-06-16</date><risdate>2024</risdate><spage>6829</spage><epage>6838</epage><pages>6829-6838</pages><eissn>2575-7075</eissn><eisbn>9798350353006</eisbn><coden>IEEPAD</coden><abstract>We are witnessing significant breakthroughs in the tech-nology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce SceneWiz3D - a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation - explicit for objects and implicit for scenes. Remarkably, an object, being represented explicitly, can be either generated from text using conventional text-to-3D approaches, or provided by users. To configure the layout of the scene and automatically place objects, we apply the Particle Swarm Optimization technique during the optimization process. Furthermore, it is difficult for certain parts of the scene (e.g., corners, occlusion) to receive multi-view supervision, leading to inferior geometry. We incor-porate an RGBD panorama diffusion model to mitigate it, resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches, enabling the generation of detailed and view-consistent 3D scenes. Our project website is at https://zqh0253.github.io/SceneWiz3D/.</abstract><pub>IEEE</pub><doi>10.1109/CVPR52733.2024.00652</doi><tpages>10</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2575-7075
ispartof	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, p.6829-6838
issn	2575-7075
language	eng
recordid	cdi_ieee_primary_10654954
source	IEEE Xplore All Conference Series
subjects	3D generation Geometry Hybrid power systems Layout Pipelines Scene generation Solid modeling Text to image Three-dimensional displays
title	Towards Text-guided 3D Scene Composition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A58%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Towards%20Text-guided%203D%20Scene%20Composition&rft.btitle=2024%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20(CVPR)&rft.au=Zhang,%20Qihang&rft.date=2024-06-16&rft.spage=6829&rft.epage=6838&rft.pages=6829-6838&rft.eissn=2575-7075&rft.coden=IEEPAD&rft_id=info:doi/10.1109/CVPR52733.2024.00652&rft.eisbn=9798350353006&rft_dat=%3Cieee_CHZPO%3E10654954%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-d62d62033b22d20354a34d29757342187c35e1ca55a655d4081d5e4b19f76d1b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10654954&rfr_iscdi=true