Skip to content

[CVPR'24] Consistent Novel View Synthesis without 3D Representation

License

Notifications You must be signed in to change notification settings

lyndonzheng/Free3D

Repository files navigation

Free3D

[arXiv] [Project] [BibTeX]

Teaser example

teaser.mp4

This repository implements the training and testing tools for Free3D by Chuanxia Zheng and Andrea Vedaldi in VGG at the University of Oxford. Given a single-view image, the proposed Free3D synthesizes correct novel views without the need of an explicit 3D representation.

Usage

Installation

# create the environment
conda create --name free3d python=3.9
conda activate free3d
# install the pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# other dependencies
pip install -r requirements.txt

Datasets

  • Objaverse: For training / evaluating on Objaverse (7,729 instances for testing), please download the rendered dataset from zero-1-to-3. The original command they provided is:
    wget https://tri-ml-public.s3.amazonaws.com/datasets/views_release.tar.gz
    
    Unzip the data file and change root_dir in configs/objaverse.yaml.
  • OmniObject3D: For evaluating on OmniObject3d (5,275 instances), please refer to OmniObject3D Github, and change root_dir in configs/omniobject3d. Since we do not train the model on this dataset, we directly evaluate on the training set.
  • GSO: For evaluating on Google Scanned Objects (GSO, 1,030 instances), please download the whole 3D models, and use the rendered code from zero-1-to-3 to get 25 views for each scene. Then, change root_dir in configs/googlescan.yaml to the corresponding location. Our rendered files are available on Google Drive.

Inference

  • batch testing for quantitative results
    python batch_test.py \
    --resume [model directory path] \
    --config [configs/*.yaml] \
    --save_path [save directory path] \
    
  • single image testing for qualitative results
    # for real examples, please download the segment anything checkpoint
    wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
    # run the single image test command
    python test.py \
    --resume [model directory path] \
    --sam_path [sam checkpoint path] \
    --img_path [image path] \
    --gen_type ['image' or 'video'] \
    --save_path [save directory path]
    
  • the general metrics are evaluated with:
    cd evaluations
    python evaluation.py --gt_path [ground truth images path] --g_path [generated NVS images path]
    

Training

  • The Ray Conditioning Normalization (RCN) to enhance the pose accuracy is trained with the following command:
    # download image-conditional stable diffusion checkpoint released by lambda labs
    # this training takes around 9 days on 4x a6000 (48G)
    wget https://cv.cs.columbia.edu/zero123/assets/sd-image-conditioned-v2.ckpt
    # or download checkpoint released by zero-1-t-3
    # this training takes around 2 days on 4x 60000 (48G)
    wget https://cv.cs.columbia.edu/zero123/assets/105000.ckpt
    # change the finetune_from in train.sh, and run the command
    sh train.sh
    
  • The pseudo-3D attention to smooth the consistency is trained with the same command (1 day on 4x A6000), but with different parameters:
    # modify the configs/objaverse.yaml as follows
    views: 4
    use_3d_transformer: True
    # modify the finetune_from in train.sh to you first stage model
    finetune_from [RCN trained model]
    

Pretrained models

  • RCN w/o pseudo 3D attention model is available at huggingface.

Related work

Citation

If you find our code helpful, please cite our paper:

@article{zheng2023free3D,
      author    = {Zheng, Chuanxia and Vedaldi, Andrea},
      title     = {Free3D: Consistent Novel View Synthesis without 3D Representation},
      journal   = {arXiv},
      year      = {2023},

Acknowledgements

Many thanks to Stanislaw Szymanowicz, Edgar Sucar, and Luke Melas-Kyriazi of VGG for insightful discussions and Ruining Li, Eldar Insafutdinov, and Yash Bhalgat of VGG for their helpful feedback. We would also like to thank the authors of Zero-1-to-3 and Objaverse-XL for their helpful discussions.

License

CC BY-NC 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

About

[CVPR'24] Consistent Novel View Synthesis without 3D Representation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published