MVDream: Multi-view Diffusion for 3D Generation

Paper Project Code Gallery (New)

Abstract

We propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

architecture

Multi-view Score Distillation

Our multi-view diffusion model can be applied as a 3D prior to 3D Generation with Score Distillation.


Example generated objects

MVDream generates objects and scenes in a multi-view consistent way.

Flying Dragon, highly detailed, breathing fire
Viking axe, fantasy, weapon, blender, 8k, HD
mecha vampire girl chibi
higly detailed, majestic royal tall ship, ...
a cute fluffy dog, 4K, HD, raw
Gandalf smiling, white hair, ...
Additional Examples

Comparison Results

We collected 40 prompts from different sources to compare with other text-to-3D methods. A fixed default configuration is used for all prompts without hyper-paramter tuning with threestudio.

Dreamfusion-IF

Magic3D-IF-SD

Text2Mesh-IF

ProlificDreamer

Ours

an astronaut riding a horse

baby yoda in the style of Mormookiee

Handpainted watercolor windmill, hand-painted

Darth Vader helmet, highly detailed

Full Test Results

dog

MV DreamBooth

Like Dreambooth3D, multi-view diffusion model can be trained with few-shot data of the same subject for personalized generation with a much simpler strategy.

Left: "Photo of a [v] dog"

Photo of a [v] dog
Photo of a [v] dog jumping
Photo of a [v] dog sitting on a rainbow carpet
Photo of a [v] dog sleeping
Additional Results

Citation

@article{shi2023MVDream,
  author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
  title = {MVDream: Multi-view Diffusion for 3D Generation},
  journal = {arXiv:2308.16512},
  year = {2023},
}