Source code of the paper Zero-Shot Duet Singing Voices Separation with Diffusion Models at the SDX workshop 2023.
Install requirements
pip install -r requirements.txt
Add environment variables, rename .env.tmp
to .env
and replace with your own variables (example values are random)
DIR_LOGS=/logs
DIR_DATA=/data
# Required if using wandb logger
WANDB_PROJECT=audioproject
WANDB_ENTITY=johndoe
WANDB_API_KEY=a21dzbqlybbzccqla4txa21dzbqlybbzccqla4tx
The config we used for the paper is exp/singing.yaml
, you can run it with
python train.py exp=singing
You'll need to download the relevant dataset and resample them to 24 kHz.
Them, modified the datamodule
section of the config to point to the right path.
Resume run from a checkpoint
python train.py exp=singing +ckpt=/logs/ckpts/2022-08-17-01-22-18/'last.ckpt'
First, download the MedleyVox dataset.
Then, run the following command to evaluate the model on the duet
subset of the dataset.
python eval.py logs/runs/XXXX/.hydra/config.yaml logs/ckpts/XXXX/last.ckpt /your/path/to/MedleyVox -T 100 --cond --hop-length 32768 --self-cond --retry 2
Some important arguments:
-T
: number of diffusion steps--cond
: use auto-regressive conditioning on the ground truth (teacher forcing). Without this flag, the model will generate the full length audio at once--self-cond
: perform auto-regressive conditioning on the generated audio if use together with--cond
--hop-length
: the hop length of the moving window--window
: the size of the moving window. Default to the same length as training data--retry
: number of retries for each auto-regressive step. The algorithm with generateretry + 1
candidates and pick the most similar one to the ground truth. Default to 0
For other arguments, please check out the code.
This baseline depends on torchnmf
.
python eval_nmf.py /your/path/to/MedleyVox/ --thresh 0.08 --division 10 --kernel-size 7
Our pre-trained singing voice diffusion model can be downloaded here. You can find the training logs and unconditional singing samples generated during training on wandb.
How do I load the model once I'm done training?
If you want to load the checkpoint to restore training with the trainer you can do python train.py exp=my_experiment +ckpt=/logs/ckpts/2022-08-17-01-22-18/'last.ckpt'
.
Otherwise if you want to instantiate a model from the checkpoint:
from main.mymodule import Model
model = Model.load_from_checkpoint(
checkpoint_path='my_checkpoint.ckpt',
learning_rate=1e-4,
beta1=0.9,
beta2=0.99,
in_channels=1,
patch_size=16,
all_other_paratemeters_here...
)
to get only the PyTorch .pt
checkpoint you can save the internal model weights as torch.save(model.model.state_dict(), 'torchckpt.pt')
.
Why no checkpoint is created at the end of the epoch?
If the epoch is shorter than log_every_n_steps
it doesn't save the checkpoint at the end of the epoch, but after the provided number of steps. If you want to checkpoint more frequently you can add every_n_train_steps
to the ModelCheckpoint e.g.:
model_checkpoint:
_target_: pytorch_lightning.callbacks.ModelCheckpoint
monitor: "valid_loss" # name of the logged metric which determines when model is improving
save_top_k: 1 # save k best models (determined by above metric)
save_last: True # additionaly always save model from last epoch
mode: "min" # can be "max" or "min"
verbose: False
dirpath: ${logs_dir}/ckpts/${now:%Y-%m-%d-%H-%M-%S}
filename: '{epoch:02d}-{valid_loss:.3f}'
every_n_train_steps: 10
Note that logging the checkpoint so frequently is not recommended in general, since it takes a bit of time to store the file.