.
├── configs # yaml files to generate different datasets
│ ├── teamplate1.yaml # teamplate1 dataset configuration
│ └── teamplate2.yaml # teamplate2 dataset configuration
├── requirements.txt # basic requirements
├── radiomixer
│ ├── config # Configuration loader/saver/validator modules
│ ├── creator # PyTorch Lightning datamodules and datasets
│ ├── io # Audio file loader/saver + Signal datatype module
│ ├── preprocessors # Single/Sequential file processor logic
│ ├── sampler # Transition sampler + Audio segment sampler
│ ├── transforms # Transforms (e.g. audio concatenator, filters, mixer, feature extractors, scallers)
│ └── utils # Utils
└── notebooks # Example notebook
This repository contains code associated with the paper titled "Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast" accepted for publication in IEEE ICASSP 2021 [PDF].
Machine learning models for audio segmentation music-speech detection are generally trained on proprietary audio, which cannot be shared. More importantly, current self-supervised methods in the audio domain require novel audio augmentation with reliable labels to perform auxiliary tasks. RadioMixer is the tool that will allow you:
- Mix audios with radio DJ workflow.
- Extract audio features
To perform datasets augmentation it is necessary to do a cpoule of steps:
- Download and store initial datasets
- Configure configuration file (e.g. teamplate1.yaml)
- Run the script
Download and store your datasets on your local machine. There is no possibility to process datasets with different sampling rates. However, it will be done in the near future.
TransformerChain is in charge of sequentially applying transformations/manipulations to the input audio(s). TransformerChain is configured through .yaml file (see Configuration logic). The package functionality is divided into different parts:
- File loader
- File Saver
- File Sampler
- TransformerChain
Entry point of the code is a preprocessor (SequentialFileProcessor object is configured in configuration file.)
If you want to understand better the code organization please refer to praudio
This section will provied with a brief description of the configuration file. Take a look at teamplate1.yaml for more details Configuration file is divided into five sections:
- FileLoader
type
: Loader typeconfigs
: Loader configs
- FileSaver
type
: Saver typeconfigs
: Saver configs
- FileSampler
dataset_split
: Train/test split (which part of the dataset is foing to be generated)test_size
: Test sizedataset_dirs
: Dataset dirrectories in a list (e.g ["dataset1/", "dataset2/"...])dataset_names
: Dataset names corresponding to dataset dirrectoriesmin_datasets
: Minimum number of dataset used to generate one audiodataset_prob
: Samling dataset probability distributionreplace
: Possibility to samle same datasetseed
: Seed
- SequentialFileProcessor
num_files_generate
: Number of files to generatesave_dir
: Save dirrectorysave_config_dir
: Config save dirrectory
- transform_chain
Transforms
: Sequence of audio transforms and manipulations
Loader is responsible for loading an audio file and create Signal object. Available modules:
ClassicLoader
TIMITLoader
Seeio/loader/loaders.py
for parameters needed to be passed into modules
Saver is responsible for saving Signal object as npz file Available modules:
WaveFeaturesSaver
Seeio/loader/savers.py
for parameters needed to be passed into modules
File sampler is responsible for providing filepaths of audio which will be loaded into Signal object and used to generate new (augmented) audio.
Transform Chain (TR) is responsible for applying different manipulation to audios. There are many steps for which TR is responsible:
- Generate segment parameters for audio segments from loaded audios. (e.g. initial timestamp, segment duration) (Available modules:
EqualSegmentSampler
) - Mixer (Available modules:
CustomMixer
,TorchMixer
). Mixer module is a composition of next manipulations:- Generate transiton parameters (Available modules:
TransitionOverlapedSegmentsParametersSampler
) - Extract audio segments (Available modules:
ExtractSegment
) - Apply Fade in/out manipulations (
CustomFilter
,TorchFilterIn
,TorchFilterOut
)
- Generate transiton parameters (Available modules:
Instrument Recognition in Musical Audio Signals