Navigate the Universe of Diffusion models with Unified workflow.
UniDiffusion is a toolbox that provides state-of-the-art training and inference algorithms, based on diffusers. UniDiffusion is aimed at researchers and users who wish to deeply customize the training of stable diffusion. We hope that this code repository can provide excellent support for future research and application extensions.
If you also want to implement the following things, have fun with UniDiffusion
- Train only
cross attention
(orconvolution
/feedforward
/ ...) layer. - Set different
lr
/weight decay
/ ... for different layers. - Using or supporting PEFT/PETL methods for different layers and easily merging them, e.g., finetune the convolution layer and update attention layer with lora.
- Train all parameter in stable diffusion, including unet, vae, text_encoder, and automatically save and load.
Note: UniDiffusion is still under development. Some modules are borrowed from other code repositories and have not been tested yet, especially the components that are not enabled by default in the configuration system. We are working hard to improve this project.
- Modular Design. UniDiffusion is designed with a modular architecture. The modular design enables easy implementation of new methods.
- Config System. LazyConfig System for more flexible syntax and cleaner config files.
- Easy to Use.
- Distributed Training: Using accelerate to support all distributed training environment.
- Experiment Tracker: Using wandb to log all training information.
- Distributed Evaluation: Evaluate ✅FID, ✅IS, CLIP Score during training
In UniDiffusion, all training methods are decomposed into three dimensions
- Learnable parameters: which layer or which module will be updated.
- PEFT/PETL method: how to update them. E.g., finetune, low-rank adaption, adapter, etc.
- Training process: default to diffuion-denoising, which can be extended like XTI.
It allows we conduct a unified training pipeline with strong config system.
Example for difference in training workflow from other codebases.
Here is a simple example. In diffusers, training text-to-image finetune
and dreambooth
like:
python train_dreambooth.py --arg ......
python train_finetune.py --arg ......
and combining or adjusting some of the methods are difficult (e.g., only training cross attention during dreambooth).
In UniDiffusion, we can easily design our own training arguments in config file:
# text-to-image finetune
unet.training_args = {'': {'mode': 'finetune'}}
# text-to-image finetune with lora
unet.training_args = {'': {'mode': 'lora'}}
# update cross attention with lora
unet.training_args = {'attn2': {'mode': 'lora'}}
# dreambooth
unet.training_args = {'': {'mode': 'finetune'}}
text_encoder.training_args = {'text_embedding': {'initial': True}}
# dreambooth with small lr for text-encoder
unet.training_args = {'': {'mode': 'finetune'}}
text_encoder.training_args = {'text_embedding': {'initial': True, 'optim_kwargs': {'lr': 1e-6}}}
and then run
accelerate launch scripts/train.py --config-file /path/to/your/config
This facilitates easier customization, combination, and enhancement of methods, and also allows for the comparison of similarities and differences between methods through configuration files.
In UniDiffusion, we provide a regular matching system for module selection. It allows us to select modules by regular matching. See Regular Matching for Module Selection for more details.
We provide a powerful support for PEFT/PETL methods. See PEFT/PETL Methods for more details.
- Install prerequisites
- Python 3.10
- Pytorch 2.0 + CUDA11.8
- CUDNN
- Install requirements
pip install -e requirements.txt
- Configuring accelerate and wandb
accelerate config
wandb login
See Train textual inversion / Dreambooth / LoRA / text-to-image Finetune for details.
accelerate launch scrits/common.py --config-file configs/train/text_to_image_finetune.py
- Train textual inversion / Dreambooth / LoRA / text-to-image Finetune.
- Customize your training process.
- [TODO] Supporting new dataset.
- [TODO] Supporting new PETL method.
- [TODO] Supporting new training pipeline.
Supported Personalization Methods
- text-to-image finetune
- dreambooth
- lora
- textual inversion
- XTI
- Custom Diffusion
Note: Personalization methods are decomposes in trainable parameters, PEFT/PETL methods, and training process in UniDiffusion. See config file for more details.
We are going to add the following features in the future. We also welcome contributions from the community. Feel free to pull requests or open an issue to discuss ideas for new features.
- Methods:
- preservation of class semantic priors (dreambooth).
- XTI & Custom Diffusion.
- RepAdapter and LyCORIS.
- Features:
- Merge PEFT to original model.
- Convert model to diffusers and webui format.
- Webui extension.
We welcome contributions from the open-source community!
- Diffusion Trainer is built based on diffusers.
- A lot of module design is borrowed from detectron2 and detrex.
- Some implementations of methods is borrowed from diffusers and LyCORIS.
If you use this toolbox in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:
- Citing UniDiffusion:
@misc{pu2022diffusion,
author = {Pu Cao, Tianrui Huang, Lu Yang, Qing Song},
title = {UniDiffusion},
howpublished = {\url{https://github.com/PRIV-Creation/UniDiffusion}},
year = {2023}
}