Skip to content

Official implementation of SyncTweedies: A General Generative Framework Based on Synchronized Diffusions (NeurIPS 2024)

License

Notifications You must be signed in to change notification settings

KAIST-Visual-AI-Group/SyncTweedies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SyncTweedies: A General Generative Framework Based on Synchronized Diffusions

teaser

Jaihoon Kim*, Juil Koo*, Kyeongmin Yeo*, Minhyuk Sung (* Denotes equal contribution)

| Website | Paper | arXiv |


Introduction

This repository contains the official implementation of SyncTweedies. SyncTweedies can be applied to various downstread applications including ambiguous image generation, arbitrary-sized image generation, 360° panorama generation and texturing 3D mesh and Gaussians. More results can be found at our project webpage.

We introduce a general diffusion synchronization framework for generating diverse visual content, including ambiguous images, panorama images, 3D mesh textures, and 3D Gaussian splats textures, using a pretrained image diffusion model. We first present an analysis of various scenarios for synchronizing multiple diffusion processes through a canonical space. Based on the analysis, we introduce a novel synchronized diffusion method, SyncTweedies, which averages the outputs of Tweedie’s formula while conducting denoising in multiple instance spaces. Compared to previous work that achieves synchronization through finetuning, SyncTweedies is a zero-shot method that does not require any finetuning, preserving the rich prior of diffusion models trained on Internet-scale image datasets without overfitting to specific domains. We verify that SyncTweedies offers the broadest applicability to diverse applications and superior performance compared to the previous state-of-the-art for each application.


Environment Setup

Software Requirements

  • Python 3.8
  • CUDA 11.7
  • PyTorch 2.0.0
git clone https://github.com/KAIST-Visual-AI-Group/SyncTweedies
conda env create -f environment.yml
pip install git+https://github.com/openai/CLIP.git
pip install -e .
3D Mesh Texturing (PyTorch3D)
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu117_pyt200/download.html
3D Gaussians Texturing (Differentiable 3D Gaussian Rasterizer - gsplat)
cd synctweedies/renderer/gaussian/gsplat
python setup.py install
pip install .

Data

3D Mesh Texturing

Use 3D mesh and prompt pairs from Text2Tex and TEXTure. Text2Tex uses a subset of Objaverse dataset.

  • 3D mesh texturing - data/mesh/turtle.obj (TEXTure), data/meshclutch_bag.obj (Text2Tex)

For 3D mesh texture editing, use the generated 3D mesh from Luma AI.

  • 3D mesh texture editing (SDEdit) - data/mesh/sdedit/mesh.obj (Luma AI)

360° Panorama Generation

Use depth maps from 360MonoDepth to generate 360° panoamra images.

  • 360° panoamra generation - data/panorama

3D Gaussians Texturing

Download Synthetic NeRF dataset and reconstruct 3D scenes using either 3D Gaussian Splatting framework or gsplat.

Use the reconstructed 3D scene for texturing 3D Gaussians.

  • 3D Gaussians texturing - data/gaussians/chair and data/gaussians/chair.ply.

Inference

Please run the commands below to run each application.

Ambiguous Image

1-to-1 Projection

python main.py --app ambiguous_image --case_num 2 --tag ambiguous_image --save_dir_now

1-to-n Projection

python main.py --app ambiguous_image --case_num 2 --tag ambiguous_image --save_dir_now --views_names identity inner_rotate

n-to-1 Projection

python main.py --app ambiguous_image --case_num 2 --tag ambiguous_image --save_dir_now --optimize_inverse_mapping

--prompts

Text prompts to guide the generation process. (Provide a prompt per view)

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--seed

Random seed.

--views_names

View transformation to each denoising process.

--rotate_angle

Rotation angle for rotation transformations.

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

--optimize_inverse_mapping

Use optimization for projection operation. (n-to-1 projection)

Arbitrary-sized Image
python main.py --app wide_image --prompt "A photo of a mountain range at twilight" --save_top_dir ./output --save_dir_now --tag wide_image --case_num 2 --seed 0 --sampling_method ddim --num_inference_steps 50 --panorama_height 512 --panorama_width 3072 --mvd_end 1.0 --initialize_xt_from_zt 

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--seed

Random seed.

--sampling_method

Denoising sampling method.

--num_inference_steps

Number of sampling steps.

--panorama_height

The height of the image to generate.

--panorama_width

The width of the image to generate.

--mvd_end

Step to stop the synchronization. (1.0 - Synchronize all timesteps, 0.0 - No synchronizaiton)

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

3D Mesh Texturing
python main.py --app mesh --prompt "A hand carved wood turtle" --save_top_dir ./output --tag mesh  --save_dir_now --case_num 2 --mesh ./data/mesh/turtle.obj --seed 0 --sampling_method ddim --initialize_xt_from_zt

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--mesh

Path to input 3D mesh.

--seed

Random seed.

--sampling_method

Denoising sampling method.

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

--steps

Number of sampling steps.

3D Mesh Texture Editing

python main.py --app mesh --prompt "lantern" --save_top_dir ./output --tag mesh  --save_dir_now --case_num 2 --mesh ./data/mesh/sdedit/mesh.obj --seed 0 --sampling_method ddim --initialize_xt_from_zt --sdedit --sdedit_prompt "A Chinese style lantern" --sdedit_timestep 0.2

--sdedit

Editing 3D mesh texture.

--sdedit_prompt

Target editing prompt. This overrides the original prompt.

--sdedit_timestep

Timestep to add noise. (1.0 - x_0, 0.0 - x_T)

360° Panorama
python main.py --app panorama --tag panorama --save_top_dir ./output --save_dir_now --prompt "An old looking library" --depth_data_path ./data/panorama/cf726b6c0144425282245b34fc4efdca_depth.dpt --case_num 2 --average_rgb --initialize_xt_from_zt --model controlnet

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--depth_data_path

Path to depth map image.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--mesh

Path to input 3D mesh.

--seed

Random seed.

--sampling_method

Denoising sampling method.

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

--steps

Number of sampling steps.

--canonical_rgb_h

Resolution (height) of the RGB canonical space.

--canonical_rgb_w

Resolution (width) of the RGB canonical space.

--canonical_latent_h

Resolution (width) of the latent canonical space.

--canonical_latent_w

Resolution (width) of the latent canonical space.

--instance_latent_size

Resolution of the latent instance space.

--instance_rgb_size

Resolution of the RGB instance space.

--theta_range

Azimuthal range (0-360)

--theta_interval

Interval of the azimuth.

--FOV

Resolution of the RGB instance space.

--average_rgb

Perform averaging in the RGB domain (Only valid for Case 2 and Case 5).

3D Gaussians Texturing
python main.py --app gs --tag gs --save_dir_now --save_top_dir ./output --prompt "A photo of majestic red throne, adorned with gold accents" --source_path ./data/gaussians/chair --plyfile ./data/gaussians/chair.ply --dataset_type blender --case_num 2 --zt_init --force_clean_composition 

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--source_path

Path to input dataset (Refer to 3D Gaussian Splatting repo for data format).

--plyfile

Path to 3D Gaussians model plyfile.

--dataset_type

Input dataset type {colmap, blender}.

--zt_init

Initialize the initial random noise by projecting from the canonical space.

--no-antialiased

Used for 3D scenes trained with 3D Gaussian Splatting framework. Do not provide this option when using 3D scenes reconstructed with gsplat.


Citation

@article{kim2024synctweedies,
  title={SyncTweedies: A General Generative Framework Based on Synchronized Diffusions},
  author={Kim, Jaihoon and Koo, Juil and Yeo, Kyeongmin and Sung, Minhyuk},
  journal={arXiv preprint arXiv:2403.14370},
  year={2024}
}

Acknowledgement

This repository is based on Visual Anagrams, SyncMVD, and gsplat. We thank the authors for publicly releasing their codes.

About

Official implementation of SyncTweedies: A General Generative Framework Based on Synchronized Diffusions (NeurIPS 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published