Loading…

Fine-tuning the Diffusion Model and Distilling Informative Priors for Sparse-view 3D Reconstruction

3D reconstruction methods such as Neural Radiance Fields (NeRFs) are capable of optimizing high-quality 3D representation from images. However, NeRF is limited by the requirement for a large number of multi-view images, making its application to real-world scenarios challenging. In this work, we pro...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tang, Jiadong, Gao, Yu, Jiang, Tianji, Yang, Yi, Fu, Mengyin
Format:	Conference Proceeding
Language:	English
Subjects:	Diffusion models Matched filters Neural radiance field Rendering (computer graphics) Semantics Solid modeling Switches Three-dimensional displays Training Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	3D reconstruction methods such as Neural Radiance Fields (NeRFs) are capable of optimizing high-quality 3D representation from images. However, NeRF is limited by the requirement for a large number of multi-view images, making its application to real-world scenarios challenging. In this work, we propose a method that can reconstruct real-world scenes from a few input images and a simple text prompt. Specifically, we fine-tune a pretrained diffusion model to constrain its powerful priors to the visual inputs and generate 3D-aware images, leveraging the coarse renderings obtained from input images as the image condition, along with the text prompt as the text condition. Our fine-tuning method saves a significant amount of training time and GPU memory usage while also generating credible results. Moreover, to enable our method to have self-evaluation capabilities, we design a semantic switch to filter out generated images that do not match real scenes, ensuring that only informative priors from the fine-tuned diffusion model are distilled into the 3D model. The semantic switch we designed can be used as a plug-in and improve performance by 13%. We perform our approach on a real-world dataset and demonstrate competitive results compared to existing sparse-view 3D reconstruction methods. Please see our project page for more visualizations and code: https://bityia.github.io/FDfusion.
ISSN:	2153-0866
DOI:	10.1109/IROS58592.2024.10802155