Paper
- Yun-Ning Hung and Yi-Hsuan Yang, "FRAME-LEVEL INSTRUMENT RECOGNITION BY TIMBRE AND PITCH", International Society for Music Information Retrieval Conference (ISMIR), 2018
The instrument recognition model is trained on MusicNet dataset, which contains 7 kinds of instrument - Piano, Violin, Viola, Cello, Clarinet, Horn and Bassoon.
- ./data/: store the pre-train models' parameters. Model's parameters will also store in this directory during training process.
- ./function/: store all the python files related to training and testing
- evl.py: for score computation and evaluation
- fit.py: training process
- lib.py: loss function and model initialization
- model.py: model structure
- norm_lib.py: data normalization
- process.py: data pre-processing
- config.py: configuration option
- run.py: start the training process
- test_frame.py: start the evaluation process
- ./mp3/: folder to put the mp3 files you want to predict
- ./plot/: folder to store the result graphic
- ./result/: folder to store the result raw data
- predict_pitch.py: pitch extraction
- prediction.py: start prediction process
- librosa==0.6.0
- matplotlib==2.2.0
- numpy==1.14.2
- pytorch==0.3.1
- mir-eval==0.4
- scikit-learn==0.18.1
- scipy==1.0.1
This section is for those who want to load the pre-train model directly for real music testing
- Put MP3/WAV files in the "mp3" folder
- Run the 'predict_pitch' python file with the name of the song as the first arg
python predict_pitch.py test.mp3
Run the prediction python file with the name of the song as the first arg and the model's name as the second arg (model's name can be found in path: data/model/)
python prediction.py test.mp3 residual
- Prediction result will be shown as a picture and stored in the "plot" folder. Prediction raw data will be stored in the "result" folder
Training and Evaluation Process
- Download MusicNet dataset (https://homes.cs.washington.edu/~thickstn/start.html)
- Follow the guid in 'process.py' to process the data
- Modify 'config.py' for training and evaluation configuration
- Run the python script 'run.py' to start the training
- Run the python script 'test_frame.py' to start evaluation
Reference Please cite these two papers when you use the MusicNet dataset and the Pitch estimator.
- John Thickstun, Zaid Harchaoui, and Sham M.Kakade. Learning features of music from scratch. In Proc. Int. Conf. Learning Representations, 2017. [Online] https://homes.cs.washington.edu/~thickstn/musicnet.html
- John Thickstun, Zaid Harchaoui, Dean P. Foster, and Sham M. Kakade. Invariances and data augmentation for supervised music transcription. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2018