[ECCV2024] TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
This repository is the official implementation of TCAN
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
Jeongho Kim*, Min-Jung Kim*, Junsoo Lee, Jaegul Choo (*: equal contribution)
- Inference code
- Release model weights
- Training code
Preprocessed TikTok: Download
Unzip the donwnloaded dataset and set the path to the dataset as follows
cd TCAN
mkdir DATA
cd DATA
ln -s [data_path] TikTok
TCAN/DATA/TikTok
L train
L valid_video
conda create -n tcan python=3.10
conda activate tcan
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install diffusers==0.25.0
pip install xformers==0.0.22
pip install accelerate==0.22.0
pip install transformers==4.32.0
pip install omegaconf
pip install einops
pip install clean-fid
pip install tensorboard
pip install imageio==2.9.0
pip install opencv-python
pip install av==11.0.0
pip install matplotlib
pip install peft==0.9.0
pip install imageio_ffmpeg
pip install ffmpeg
pip install scikit-image==0.20.0
pip install lpips
pip install onnxruntime
pip install numpy==1.26.4
- stablediffusion-v1.5
- VAE
- ControlNet
git lfs install
cd checkpoints
git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
git clone https://huggingface.co/stabilityai/sd-vae-ft-mse
cd ..
# download yolox_l.onnx and dw-ll_ucoco_384.onnx
cd dwpose/annotator
git clone https://huggingface.co/yzd-v/DWPose ckpts
Place the downloaded weights into the 'TCAN/checkpoints' directory.
motion module weights provided by AnimateDiff: mm_sd_v15.ckpt, mm_sd_v15_v2.ckpt.
RealisticVision UNet weights: realisticVision
🔥We trained our model using two A100🔥
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 --master_port 3874 train.py \
--config "./configs/train/first_stage.yaml" \
--batch_size 2\
--motion_type dwpose \
--pretrained_unet_path "./checkpoints/realisticVisionV51_v20Novae.safetensors" \
--pretrained_appearance_encoder_path "./checkpoints/realisticVisionV51_v20Novae.safetensors" \
--pretrained_controlnet_path "./checkpoints/control_v11p_sd15_openpose_RenamedForMA.pth" \
--freeze_controlnet \
--init_unet_lora \
--save_name First_Unetlora
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 --master_port 6836 train.py \
--config ./configs/train/second_stage.yaml \
--num_workers 2 \
--batch_size 1 \
--is_second_stage \
--motion_type dwpose \
--pretrained_unet_path "./logs/20240419_First_Unetlora/models/[UNet]_[Epoch=1]_[Iter=100]_[loss=0.1025].ckpt" \
--pretrained_appearance_encoder_path "./logs/20240419_First_Unetlora/models/[AppearanceEncoder]_[Epoch=1]_[Iter=100]_[loss=0.1025].ckpt" \
--pretrained_controlnet_path "./checkpoints/control_v11p_sd15_openpose_RenamedForMA.pth" \
--init_unet_lora \
--load_unet_lora_weight \
--use_temporal_controlnet \
--save_name SecondUnetloraTctrl