Controlnet Application
Task: controlnet_animation
It is difficult to keep consistency and avoid video frame flickering when using stable diffusion to generate video frame by frame. Here we reproduce two methods that effectively avoid video flickering:
Controlnet with multi-frame rendering. ControlNet is a neural network structure to control diffusion models by adding extra conditions. Multi-frame rendering is a community method to reduce flickering. We use controlnet with hed condition and stable diffusion img2img for multi-frame rendering.
Controlnet with attention injection. Attention injection is widely used to generate the current frame from a reference image. There is an implementation in sd-webui-controlnet and we use some of their code to create the animation in this repo.
You may need 40G GPU memory to run controlnet with multi-frame rendering and 10G GPU memory for controlnet with attention injection. If the config file is not changed, it defaults to using controlnet with attention injection.
prompt key words: a handsome man, silver hair, smiling, play basketball
caixukun_dancing_begin_fps10_frames_cat.mp4
prompt key words: a handsome man
zhou_woyangni_fps10_frames_resized_cat.mp4
Change prompt to get different result
prompt key words: a girl, black hair, white pants, smiling, play basketball
caixukun_dancing_begin_fps10_frames_girl2.mp4
We use pretrained model from hugging face.
Model | Dataset | Download |
---|---|---|
anythingv3 config | - | stable diffusion model |
There are two ways to try controlnet animation.
Running the following codes, you can get an generated animation video.
from mmagic.apis import MMagicInferencer
# Create a MMEdit instance and infer
editor = MMagicInferencer(model_name='controlnet_animation')
prompt = 'a girl, black hair, T-shirt, smoking, best quality, extremely detailed'
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, ' + \
'extra digit, fewer digits, cropped, worst quality, low quality'
# you can download the example video with this link
# https://user-images.githubusercontent.com/12782558/227418400-80ad9123-7f8e-4c1a-8e19-0892ebad2a4f.mp4
video = '/path/to/your/input/video.mp4'
save_path = '/path/to/your/output/video.mp4'
# Do the inference to get result
editor.infer(video=video, prompt=prompt, negative_prompt=negative_prompt, save_path=save_path)
python demo/gradio_controlnet_animation.py
change "inference_method" in anythingv3 config
To use multi-frame rendering.
inference_method = 'multi-frame rendering'
To use attention injection.
inference_method = 'attention_injection'
We also provide a demo to play controlnet animation with sam, for details, please see OpenMMLab PlayGround.
@misc{zhang2023adding,
title={Adding Conditional Control to Text-to-Image Diffusion Models},
author={Lvmin Zhang and Maneesh Agrawala},
year={2023},
eprint={2302.05543},
archivePrefix={arXiv},
primaryClass={cs.CV}
}