We propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.
Our multi-view diffusion model can be applied as a 3D prior to 3D Generation with Score Distillation.
MVDream generates objects and scenes in a multi-view consistent way.
We collected 40 prompts from different sources to compare with other text-to-3D methods. A fixed default configuration is used for all prompts without hyper-paramter tuning with threestudio.
Dreamfusion-IF
Magic3D-IF-SD
Text2Mesh-IF
ProlificDreamer
Ours
an astronaut riding a horse
baby yoda in the style of Mormookiee
Handpainted watercolor windmill, hand-painted
Darth Vader helmet, highly detailed
Like Dreambooth3D, multi-view diffusion model can be trained with few-shot data of the same subject for personalized generation with a much simpler strategy.
Left: "Photo of a [v] dog"
@article{shi2023MVDream,
author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
title = {MVDream: Multi-view Diffusion for 3D Generation},
journal = {arXiv:2308.16512},
year = {2023},
}