Skip to content

Latest commit

 

History

History
120 lines (78 loc) · 4.31 KB

README.md

File metadata and controls

120 lines (78 loc) · 4.31 KB

Controlnet Animation (2023)

Controlnet Application

Task: controlnet_animation

Abstract

It is difficult to keep consistency and avoid video frame flickering when using stable diffusion to generate video frame by frame. Here we reproduce two methods that effectively avoid video flickering:

Controlnet with multi-frame rendering. ControlNet is a neural network structure to control diffusion models by adding extra conditions. Multi-frame rendering is a community method to reduce flickering. We use controlnet with hed condition and stable diffusion img2img for multi-frame rendering.

Controlnet with attention injection. Attention injection is widely used to generate the current frame from a reference image. There is an implementation in sd-webui-controlnet and we use some of their code to create the animation in this repo.

You may need 40G GPU memory to run controlnet with multi-frame rendering and 10G GPU memory for controlnet with attention injection. If the config file is not changed, it defaults to using controlnet with attention injection.

Demos

prompt key words: a handsome man, silver hair, smiling, play basketball

caixukun_dancing_begin_fps10_frames_cat.mp4

prompt key words: a handsome man

zhou_woyangni_fps10_frames_resized_cat.mp4

Change prompt to get different result

prompt key words: a girl, black hair, white pants, smiling, play basketball

caixukun_dancing_begin_fps10_frames_girl2.mp4

Pretrained models

We use pretrained model from hugging face.

Model Dataset Download
anythingv3 config - stable diffusion model

Quick Start

There are two ways to try controlnet animation.

1. Use MMagic inference API.

Running the following codes, you can get an generated animation video.

from mmagic.apis import MMagicInferencer

# Create a MMEdit instance and infer
editor = MMagicInferencer(model_name='controlnet_animation')

prompt = 'a girl, black hair, T-shirt, smoking, best quality, extremely detailed'
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, ' + \
                  'extra digit, fewer digits, cropped, worst quality, low quality'

# you can download the example video with this link
# https://user-images.githubusercontent.com/12782558/227418400-80ad9123-7f8e-4c1a-8e19-0892ebad2a4f.mp4
video = '/path/to/your/input/video.mp4'
save_path = '/path/to/your/output/video.mp4'

# Do the inference to get result
editor.infer(video=video, prompt=prompt, negative_prompt=negative_prompt, save_path=save_path)

2. Use controlnet animation gradio demo.

python demo/gradio_controlnet_animation.py

3. Change config to use multi-frame rendering or attention injection.

change "inference_method" in anythingv3 config

To use multi-frame rendering.

inference_method = 'multi-frame rendering'

To use attention injection.

inference_method = 'attention_injection'

Play animation with SAM

We also provide a demo to play controlnet animation with sam, for details, please see OpenMMLab PlayGround.

Citation

@misc{zhang2023adding,
  title={Adding Conditional Control to Text-to-Image Diffusion Models},
  author={Lvmin Zhang and Maneesh Agrawala},
  year={2023},
  eprint={2302.05543},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}