EG3D (CVPR'2022)

Efficient geometry-aware 3D generative adversarial networks

Task: 3D-aware Generation

Abstract

Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.

Results and Models

Model	Dataset	Comment	FID50k	FID50k-Camera	Download
ShapeNet-Car	ShaperNet-Car	official weight	5.6573	5.2325	model
AFHQ	AFHQ	official weight	2.9134	6.4213	model
FFHQ	FFHQ	official weight	4.3076	6.4453	model

FID50k-Camera denotes image generated with random sampled camera position.
FID50k denotes image generated with camera position randomly sampled from the original dataset.

Influence of FP16

All metrics are evaluated under FP32, and it's hard to determine how they will change if we use FP16. For example, if we use FP16 at the super resolution module in FFHQ model, the output images will be slightly blurrier than the ones generated under FP32, but FID (4.03) will be better than FP32 ones.

About generate images and videos with High-Level API

You can use the following command to generate sequence images with continuous changed camera position as input.

python demo/mmediting_inference_demo.py --model-name eg3d \
    --model-config configs/eg3d/eg3d_cvt-official-rgb_afhq-512x512.py \
    --model-ckpt https://download.openmmlab.com/mmediting/eg3d/eg3d_cvt-official-rgb_afhq-512x512-ca1dd7c9.pth \
    --result-out-dir eg3d_output \  # save images and videos to `eg3d_output`
    --interpolation camera \  # interpolation camera position only
    --num-images 100  # generate 100 images during interpolation

The the following video will be saved to eg3d_output.

combine_seed2022.mp4

To interpolate the camera position and style code at the same time, you can use the following command.

python demo/mmediting_inference_demo.py --model-name eg3d \
    --model-config configs/eg3d/eg3d_cvt-official-rgb_ffhq-512x512.py \
    --model-ckpt https://download.openmmlab.com/mmediting/eg3d/eg3d_cvt-official-rgb_ffhq-512x512-5a0ddcb6.pth \
    --result-out-dir eg3d_output \  # save images and videos to `eg3d_output`
    --interpolation both \  # interpolation camera and conditioning both
    --num-images 100  # generate 100 images during interpolation
    --seed 233  # set random seed as 233

ffhq-both-seed233.mp4

If you only want to save video of depth map, you can use the following command:

python demo/mmediting_inference_demo.py --model-name eg3d \
    --model-config configs/eg3d/eg3d_cvt-official-rgb_shapenet-128x128.py \
    --model-ckpt https://download.openmmlab.com/mmediting/eg3d/eg3d_cvt-official-rgb_shapenet-128x128-85757f4d.pth \
    --result-out-dir eg3d_output \  # save images and videos to `eg3d_output`
    --interpolation camera \  # interpolation camera position only
    --num-images 100 \  # generate 100 images during interpolation
    --vis-mode depth  # only visualize depth image

car-depth_seed0.mp4

How to prepare dataset

You should prepare your dataset follow the official repo. Then preprocess the dataset.json with the following script:

import json
from argparse import ArgumentParser

from mmengine.fileio.io import load


def main():

    parser = ArgumentParser()
    parser.add_argument(
        'in-anno', type=str, help='Path to the official annotation file.')
    parser.add_argument(
        'out-anno', type=str, help='Path to MMEditing\'s annotation file.')
    args = parser.parse_args()

    anno = load(args.in_anno)
    label = anno['labels']

    anno_dict = {}
    for line in label:
        name, label = line
        anno_dict[name] = label

    with open(args.out_anno, 'w') as file:
        json.dump(anno_dict, file)


if __name__ == '__main__':
    main()

Citation

@InProceedings{Chan_2022_CVPR,
    author    = {Chan, Eric R. and Lin, Connor Z. and Chan, Matthew A. and Nagano, Koki and Pan, Boxiao and De Mello, Shalini and Gallo, Orazio and Guibas, Leonidas J. and Tremblay, Jonathan and Khamis, Sameh and Karras, Tero and Wetzstein, Gordon},
    title     = {Efficient Geometry-Aware 3D Generative Adversarial Networks},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {16123-16133}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EG3D (CVPR'2022)

Abstract

Results and Models

Influence of FP16

About generate images and videos with High-Level API

How to prepare dataset

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

EG3D (CVPR'2022)

Abstract

Results and Models

Influence of FP16

About generate images and videos with High-Level API

How to prepare dataset

Citation