Task: 3D-aware Generation
Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.
Model | Dataset | Comment | FID50k | FID50k-Camera | Download |
---|---|---|---|---|---|
ShapeNet-Car | ShaperNet-Car | official weight | 5.6573 | 5.2325 | model |
AFHQ | AFHQ | official weight | 2.9134 | 6.4213 | model |
FFHQ | FFHQ | official weight | 4.3076 | 6.4453 | model |
FID50k-Camera
denotes image generated with random sampled camera position.FID50k
denotes image generated with camera position randomly sampled from the original dataset.
All metrics are evaluated under FP32, and it's hard to determine how they will change if we use FP16. For example, if we use FP16 at the super resolution module in FFHQ model, the output images will be slightly blurrier than the ones generated under FP32, but FID (4.03) will be better than FP32 ones.
You can use the following command to generate sequence images with continuous changed camera position as input.
python demo/mmediting_inference_demo.py --model-name eg3d \
--model-config configs/eg3d/eg3d_cvt-official-rgb_afhq-512x512.py \
--model-ckpt https://download.openmmlab.com/mmediting/eg3d/eg3d_cvt-official-rgb_afhq-512x512-ca1dd7c9.pth \
--result-out-dir eg3d_output \ # save images and videos to `eg3d_output`
--interpolation camera \ # interpolation camera position only
--num-images 100 # generate 100 images during interpolation
The the following video will be saved to eg3d_output
.
combine_seed2022.mp4
To interpolate the camera position and style code at the same time, you can use the following command.
python demo/mmediting_inference_demo.py --model-name eg3d \
--model-config configs/eg3d/eg3d_cvt-official-rgb_ffhq-512x512.py \
--model-ckpt https://download.openmmlab.com/mmediting/eg3d/eg3d_cvt-official-rgb_ffhq-512x512-5a0ddcb6.pth \
--result-out-dir eg3d_output \ # save images and videos to `eg3d_output`
--interpolation both \ # interpolation camera and conditioning both
--num-images 100 # generate 100 images during interpolation
--seed 233 # set random seed as 233
ffhq-both-seed233.mp4
If you only want to save video of depth map, you can use the following command:
python demo/mmediting_inference_demo.py --model-name eg3d \
--model-config configs/eg3d/eg3d_cvt-official-rgb_shapenet-128x128.py \
--model-ckpt https://download.openmmlab.com/mmediting/eg3d/eg3d_cvt-official-rgb_shapenet-128x128-85757f4d.pth \
--result-out-dir eg3d_output \ # save images and videos to `eg3d_output`
--interpolation camera \ # interpolation camera position only
--num-images 100 \ # generate 100 images during interpolation
--vis-mode depth # only visualize depth image
car-depth_seed0.mp4
You should prepare your dataset follow the official repo. Then preprocess the dataset.json
with the following script:
import json
from argparse import ArgumentParser
from mmengine.fileio.io import load
def main():
parser = ArgumentParser()
parser.add_argument(
'in-anno', type=str, help='Path to the official annotation file.')
parser.add_argument(
'out-anno', type=str, help='Path to MMEditing\'s annotation file.')
args = parser.parse_args()
anno = load(args.in_anno)
label = anno['labels']
anno_dict = {}
for line in label:
name, label = line
anno_dict[name] = label
with open(args.out_anno, 'w') as file:
json.dump(anno_dict, file)
if __name__ == '__main__':
main()
@InProceedings{Chan_2022_CVPR,
author = {Chan, Eric R. and Lin, Connor Z. and Chan, Matthew A. and Nagano, Koki and Pan, Boxiao and De Mello, Shalini and Gallo, Orazio and Guibas, Leonidas J. and Tremblay, Jonathan and Khamis, Sameh and Karras, Tero and Wetzstein, Gordon},
title = {Efficient Geometry-Aware 3D Generative Adversarial Networks},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {16123-16133}
}