Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polyphonic music #27

Open
Ayyin opened this issue Jan 24, 2024 · 1 comment
Open

polyphonic music #27

Ayyin opened this issue Jan 24, 2024 · 1 comment

Comments

@Ayyin
Copy link

Ayyin commented Jan 24, 2024

Thanks a lot for the great work!I would like to know if this algorithm only works for monophonic music? Is it also applicable to polyphonic music? I've done some previous research on melody extraction for polyphonic music and used the MIR1K dataset as well. May I ask if there is any difference in your processing or labelling of this dataset? Because I used the previous audio files and annotation files to train this algorithm, but the metrics I got were not good. I wonder if it would be convenient for you to send me a copy of the dataset.

@aRI0U
Copy link
Collaborator

aRI0U commented Jan 29, 2024

Hi!
Thanks a lot for your message, glad you like our work. Sorry, pesto works for monophonic music only, so if you run it on polyphonic music it'll likely pick randomly one of the pitches but it won't be reliable. However if you're interested in melody extraction you can try to train your own pesto model to be more robust to background music by choosing better data augmentations for the invariance loss, we did this in the paper for mir-1k and the results were decent.

About the MIR-1K dataset specifically, the resolution of the annotations is 20ms and the pitch label is the center of the frames, so I had to add a zero at the beginning and end of each annotation file for it to match the number of CQT frames.
Also, the metric we use for evaluation is Raw Pitch Accuracy (RPA), which only takes into account the voiced frames. You should have an overall RPA of ~95-96% on MIR-1K, however if you forget to remove the voiced frames performance drop to ~65%.

Also, if you train your own model, can you let me know if you observe a gap of performances between your model and the one we provide in this package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants