Name	Name	Last commit message	Last commit date
Latest commit History 35 Commits
SemiSupCon	SemiSupCon
config	config
data	data
media	media
scripts	scripts
.gitignore	.gitignore
README.md	README.md
case_study.ipynb	case_study.ipynb
finetune.py	finetune.py
pretrain.py	pretrain.py
test.py	test.py
zero_shot_eval.py	zero_shot_eval.py

Semi-Supervised Contrastive Learning of Musical Representations

Julien Guinot^1,2, Elio Quinton², Gyorgy Fazekas¹
¹ Queen Mary University of London, ² Universal Music Group

Official Pytorch implementation of the paper Semi-Supervised Contrastive Learning of Musical Representations, By Julien Guinot, Elio Quinton and Gyorgy Fazekas. Accepted and to be published at ISMIR 2024 in San Francisco.

In this work, we introduce a novel framework for semi-supervised contrastive learning using labels and minimal amounts of labeled data. This flexible framework enables using any supervision signal (binary labels, multiclass labels, multilabel labels, regression targets, domain-knowledge similarity metric) to guide contrastive learning to learn musically-informed representations.

Briefly, the contributions of this work are the following:

We propose an architecturally simple extension of self-supervised and supervised contrastive learning to the semi-supervised case with the ability to make use of a variety of supervision signals.
We show the ability of our method to shape the representations according to the support supervision signal used for the learning task with minimal performance loss on other tasks.
We propose a representation learning framework with low-data regime potential and higher robustness to data corruption.

An illustration of SemiSupCon. Labels augment a target contrastive matrix which guides the learned representation towards a target similarity metric.

Setup

We recommand setting up a new environment (using conda or venv) before installing requirements for this project. This project using pytorch-lightning for training and probing experiments. We are currently working on a minimal requirements file and package only for loading models and running inference.

git clone https://github.com/Pliploop/SemiSupCon.git && cd SemiSupCon

pip install requirements.txt

We do not yet provide scripts for downloading datasets, but the following section provides a comprehensive guide on how to implement your own dataset or re-use the currently implemented datasets, namely:

MagnaTagATune (all tags and top 50 tags)
MTG Jamendo (top 50 tags, moodtheme split, genre split, instrumentation split)
GTZAN
Vocalset (singer identity or technique)
Giantsteps key (as implemented in MARBLE)
MedleyDB
Nysnth (pitch class and instrument)
EmoMusic

Training your own SemiSupCon model

We provide a script pretrain.py to train your own SemiSupCon model in a supervised, semi-supervised or self-supervised fashion. training arguments are provided through pytorch lightning and LighningCLI, meaning config .yaml files are used to provide training arguments. all arguments can be overridden from the command line. The files

config/pretraining.pretrain_sl.yaml
config/pretraining.pretrain_smsl.yaml
config/pretraining.pretrain_ssl.yaml

provide a boilerplate template for training. Disregarding details of dataset availability, a training run can be started with:

python pretrain.py --config config_file_path (--trainer.max_steps=100000 --data.data_dir=x --model.optimizer.lr=0.01)

On dataloading and semi-supervised dataloading

when running an experiment, the config arguments --data.ssl_task and --data.sl_task are essential, and provide information on what data to use for self-supervised and supervised dataloading.

all sl_tasks must be named as their name will determine the dataloading logic used in the dataloading/datamodule_splitter.py file. naming an ssl_task is optional (e.g. a specific train_test_val split is required), but all named tasks must have a logic implemented into SemiSupCon/dataloading/datamodule_splitter.py.

If None is provided for sl_task, the training will be self-supervised. If None is provided for ssl_task, the datamodule will look to --data.data_dir for a data source folder and split it using --data.val_split. If none is provided, training will be fully-supervised.

If neither sl_task,ssl_task, nor data_dir are provided, the training will not be able to run.

Additional flags exist to toggle fully-supervised training even with an ssl directory or task provided. switch --data.fully_supervised to True to toggle this behaviour even when ssl data is provided:

python pretrain.py --config config_file_path --data.sl_task='mtat_top50' --data.fully_supervised=true

All logic for dealing with specific datasets is handled in the file datamodule_splitter.py file. If you wish to implement your own dataset for semi-supervised or fully-supervised training, a get_{dataset}_annotations method must be implemented which returns an annotations dataframe as well as a idx2class dictionary (optional). Please take inspiration from existing get_{dataset}_annotations methods to build your own. Once the method is implemented, kwargs for loading (e.g. file paths) can be passed through --data.sl_kwargs in the config file for flexibility. For existing datasets, these kwargs should be adapted from existing ones (defaults provided in the datamodule splitter).

dealing with $p_s$ and $b_s$

Our paper studies the influence of $p_s$ and $b_s$, respectively the proportion of the supervised dataset used for training and the proportion of labeled data in each batch. These parameters are easily controlled through --data.supervised_data_p and --data.intrabatch_supervised_p. note that $p_s = 0$ or $b_s = 0$ leads to fully-self-supervised training and $b_s = 1$ leads to fully-supervised training. By default, we suggest keeping $p_s$ as 1. $b_s$ is a hyperameter which we have found to have varying influence.

logging

Logging was done using wandb for this project, and as such no other logger is implemented thus far. We invite different logger users to contribute by providing options to switch between loggers. Logging is turned off by default and can be activated with --log=true. The folder where model checkpoints are saved can also be modified. to resume a run, simply provide the path to the checkpoint to resume from:

python pretrain.py --config config_path --log=true --ckpt_path=myckpt --resume_from_ckpt=myckptpath

augmentations and loading strategy

A plethora of augmentations are provided in SemiSupCon/dataloading/custom_augmentations and can be changed in the --data.aug_list argument. We recommend not modifying --data.aug_severity by default as it relates to one of the experiments in the paper.

Custom encoders

By default, our approach uses SampleCNN as the encoder. Simply implement your own encoder and change the --model.encoder.class_path and potential --model.encoder.init_args in the config file. Target length of loaded audio and sample rate for the encoder can be changed --data.target_sr and `--data.target_len

Extracting representations / Finetuning

Cite

Contact

Please get in touch with any questions : [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semi-Supervised Contrastive Learning of Musical Representations

Setup

Training your own SemiSupCon model

On dataloading and semi-supervised dataloading

dealing with $p_s$ and $b_s$

logging

augmentations and loading strategy

Custom encoders

Extracting representations / Finetuning

Cite

Contact

About

Releases

Packages

Contributors 2

Languages

Pliploop/SemiSupCon

Folders and files

Latest commit

History

Repository files navigation

Semi-Supervised Contrastive Learning of Musical Representations

Setup

Training your own SemiSupCon model

On dataloading and semi-supervised dataloading

dealing with $p_s$ and $b_s$

logging

augmentations and loading strategy

Custom encoders

Extracting representations / Finetuning

Cite

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages