SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
TPAMI 2023

Zhaoxi Chen
Guangcong Wang
Ziwei Liu

TL;DR: SceneDreamer learns to generate unbounded 3D scenes from in-the-wild 2D image collections.
Our method can synthesize diverse landscapes across different styles, with 3D consistency, well-defined depth, and free camera trajectory.

Abstract

In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noises. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our framework starts from an efficient bird's-eye-view (BEV) representation generated from simplex noise, which consists of a height field and a semantic field. The height field represents the surface elevation of 3D scenes, while the semantic field provides detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Furthermore, we propose a novel generative neural hash grid to parameterize the latent space given 3D positions and the scene semantics, which aims to encode generalizable features across scenes and align content. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.

Gallery

Recommend to enter full screen for better visual quality

Framework

Given a simplex noise and a style code as input, our model is capable of synthesizing large-scale 3D scenes where the camera can move freely and get realistic renderings. We first derive our BEV scene representation which consists of a height field and a semantic field. Then, we use a generative neural hash grid to parameterize the hyperspace of space-varied and scene-varied latent features given scene semantics and 3D position. Finally, a style-modulated renderer is employed to blend latent features and render 2D images via volume rendering. The entire framework is trained on in-the-wild 2D images end-to-end.

Video

Citation

@ARTICLE {chen2023sd,
author = {Z. Chen and G. Wang and Z. Liu},
journal = {IEEE Transactions on Pattern Analysis & Machine Intelligence},
title = {SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections},
year = {2023},
volume = {45},
number = {12},
issn = {1939-3539},
pages = {15562-15576},
doi = {10.1109/TPAMI.2023.3321857},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {dec}
}

Acknowledgements

This work is supported by the National Research Foundation, Singapore under its AI Singapore Programme, NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).
The website template is borrowed from Mip-NeRF.

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
TPAMI 2023

Paper

Video

Code

Demo

Abstract

Gallery

Framework

Video

Citation

Related Links

Acknowledgements

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections TPAMI 2023

Paper

Video

Code

Demo

Abstract

Gallery

Framework

Video

Citation

Related Links

Acknowledgements

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
TPAMI 2023