Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SF2 Support #76

Merged
merged 20 commits into from
Jul 30, 2024
Merged

SF2 Support #76

merged 20 commits into from
Jul 30, 2024

Conversation

MyBlackMIDIScore
Copy link
Member

@MyBlackMIDIScore MyBlackMIDIScore commented May 21, 2024

Right now it is in a working state but there are plenty of things to be added, as well as optimized.
Also the checks are probably going to fail due to MTK having some build errors, for which I have made a PR on its repo (arduano/midi-toolkit-rs#14).

To-Do

  • Stereo linked samples (*3)
  • Fix resampling when loading the soundfont (*1)
  • Figure out a way to de-duplicate some of the instrument/preset generator code
  • Optimize? (*2)

Notable changes

  • Replaced the sinc resampler with rubato, because the current implementation had very low quality results on low sample rate samples. (*1 ->) This still needs to be change/fixed though as the looping indexes are usually wrong. One solution I thought of is to not resample anything when loading the soundfont and use the speed multiplier in the spawner settings (essentially resampling live) but this has low quality downsampling especially using nearest neighbor (default).
  • Fixed some instrument related bugs
  • Changed the instrument replacing rules to the conventional ones
  • Changed the text and names in the examples to imply SF2 support
  • Changed the voice popping algorithm so that if the ignored ID has more voices than the limit it allows those voices to exceed the limit instead of freezing the entire synth

Preview

(*2 ->) Maybe some SIMD on the generator checking and loading but idk how to do that or even if it's possible
(*3 ->) Currently all samples are mono, so SF2s with stereo samples will use 2 voices per key hit. The SF2 spec has some options for linking samples with their stereo pair but it is way more complicated than it seems

@arduano
Copy link
Collaborator

arduano commented Jun 16, 2024

Hey just FYI, there's this cool thing called Sapling by facebook https://sapling-scm.com/
It makes large PRs a lot easier, because you get to split it up into multiple PRs easily.

Main downside is it replaces git locally, so you'd need to clone a new repository. But once you clone it, you can just
sl pull -B sl2 to grab this branch into the environment, and then if you want to squash the commits together, sl fold --from 023d3e4::149dce9 (the refs of the above commits)

Up to you if you wanna use it, but IMO it's really fun and I can't go back to normal git anymore

If you aren't familiar with diff stacking, here's a video explaining what it is (using a different tool called graphite) https://www.youtube.com/watch?v=I88z3zX3lMY

@MyBlackMIDIScore
Copy link
Member Author

I'll take a look but probably some time later, I already had a pretty rough time learning git properly lol
Plus in this PR all the changes are basically in the same place
Thanks for the suggestion though

@MyBlackMIDIScore
Copy link
Member Author

MyBlackMIDIScore commented Jun 18, 2024

Btw @arduano I think I am done with this PR, I just need your input for the changes I mentioned in my first message.:

Optimize?

I am not sure what else to do so I will ignore it. The instrument/preset loading is pretty fast anyway, what takes a lot of time is the resampling.

Replaced the sinc resampler with rubato, because the current implementation had very low quality results on low sample rate samples.

Even though the quality has improved, loading soundfonts that need resampling takes a pretty long time. Although if we revert to the old resampler the quality loss is extremely noticeable on low sample rate SFs. Also see below

This still needs to be change/fixed though as the looping indexes are usually wrong. One solution I thought of is to not resample anything when loading the soundfont and use the speed multiplier in the spawner settings (essentially resampling live) but this has low quality downsampling especially using nearest neighbor (default).

I stopped noticing this indexing issue after tinkering with the values of the sinc resampler but I would still like to hear your take on this "live resampling" idea (also about the performance issues I mentioned above).

Changed the voice popping algorithm so that if the ignored ID has more voices than the limit it allows those voices to exceed the limit instead of freezing the entire synth

I also a bit unsure about the implementation of this. Mostly regarding performance because I am searching the voice buffers on every new addition. I'll probably do some benchmarks but if you have some time take a look at this too.
Such a change is necessary though because it caused hard freezes.

@MyBlackMIDIScore MyBlackMIDIScore marked this pull request as ready for review June 18, 2024 21:13
core/src/channel/channel_sf.rs Outdated Show resolved Hide resolved
core/src/channel/channel_sf.rs Show resolved Hide resolved
@arduano
Copy link
Collaborator

arduano commented Jun 18, 2024

You didn't change the actual voice code right, or am I missing something?

Regarding live resampling, it would be a massive performance hit. Currently we just do nearest neighbor, but even linear interpolation takes like a 4x hit or something big like that. The goal is to pre-interpolate as much as possible into bery high sample rates (4x or 8x the output sample rate iirc) so when pitch bends happen, the nearest neighbor interpolation becomes higher quality

@@ -143,7 +143,9 @@ impl VoiceBuffer {
}

if let Some(max_voices) = max_voices {
if self.options.fade_out_killing {
if self.buffer.iter().filter(|v| v.id == id).count() > max_voices {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait in what context is this condition even possible, what's your layer count? And how many voices do your SF2 spawners create per voice stack?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can debug this by using a soundfont that has 2 voices per key press (for example XP-80 fantasy) and set the layer count to 1. Before this change, two voices with the same ID would be added and ignored. So it would constantly try to reduce the voices to 1 while ignoring two voices and this would end up in an infinite loop, freezing the synth.
What I am skeptical about is the implementation for this fix because it might be slow when the buffers have a lot of voices. I'll think if I can come up with a better solution though.

Copy link
Collaborator

@arduano arduano Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, It's probably better to replace Iterator with ExactSizeIterator (a rust builtin trait), and then do conditions based on that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, It's probably better to replace Iterator with ExactSizeIterator (a rust builtin trait), and then do conditions based on that

I am guessing here you mean to use ExactSizeIterator for the voices parameter of the function. I tried this and it will need a lot more changes in the code so I added a counter of the added voices since we iterate over them anyway. If you meant something else let me know.

@MyBlackMIDIScore
Copy link
Member Author

You didn't change the actual voice code right, or am I missing something?

Regarding live resampling, it would be a massive performance hit. Currently we just do nearest neighbor, but even linear interpolation takes like a 4x hit or something big like that. The goal is to pre-interpolate as much as possible into bery high sample rates (4x or 8x the output sample rate iirc) so when pitch bends happen, the nearest neighbor interpolation becomes higher quality

No I don't think I changed any voice code.

Alright so should I keep the rubato resampler? It can be slower but it has much higher quality resampling.
Also question, does the speed multiplier given to the nearest or linear resamplers affect the speed? Because the sample will be passed through the resampler even if the multiplier is 1. I remember this because when I use the linear interpolator, the rendering is slower no matter the pitch of the sounds.

@MyBlackMIDIScore
Copy link
Member Author

Btw @arduano could you review this when you find some time so I can make any further changes and merge

I am also planning to do some other things (in another PR) that will probably conflict with this if it's not merged first

@arduano
Copy link
Collaborator

arduano commented Jul 27, 2024

Hey yeah sorry I must've missed the emails for your original comments

@arduano
Copy link
Collaborator

arduano commented Jul 27, 2024

Addressed the comments

@MyBlackMIDIScore
Copy link
Member Author

I made the changes. If there is anything else let me know.
Also what about this?

Alright so should I keep the rubato resampler? It can be slower but it has much higher quality resampling.
Also question, does the speed multiplier given to the nearest or linear resamplers affect the speed? Because the sample will be passed through the resampler even if the multiplier is 1. I remember this because when I use the linear interpolator, the rendering is slower no matter the pitch of the sounds.

@arduano
Copy link
Collaborator

arduano commented Jul 27, 2024

Alright so should I keep the rubato resampler?
Slower than what, the manual implementation of a sinc resampler I had before? It's probably ok because it only runs on soundfont load, but just make sure it's not horribly slow on bigger SFZs

Also question, does the speed multiplier given to the nearest or linear resamplers affect the speed?
There are 2 places that resampling is done right now: soundfont load and playback. In playback, I found that anything other than nearest neighbor heavily hits the performance. Though this was a while ago, and I noticed a bug in how I was doing SIMD, so I might be wrong.

To compensate for the lack of interpolation on playback, I increased the sample multiplier for soundfont loading, so a 48000 audio sample would become 192000 or something like that. This greatly reduces the static hissing noise during pitch bends (btw only pitch bends are affected).

Of course, the cost is that more memory is required.

@MyBlackMIDIScore
Copy link
Member Author

Slower than what, the manual implementation of a sinc resampler I had before? It's probably ok because it only runs on soundfont load, but just make sure it's not horribly slow on bigger SFZs

Yeah, it is slower compared to your resampler. I will do some benchmarks before merging but I'd rather keep it anyway because it improves the quality a lot.

To compensate for the lack of interpolation on playback, I increased the sample multiplier for soundfont loading, so a 48000 audio sample would become 192000 or something like that. This greatly reduces the static hissing noise during pitch bends (btw only pitch bends are affected). Of course, the cost is that more memory is required.

From what I can tell the samples are resampled to the stream's sample rate. Where is that part of code you are talking about?
Also from some tests I did with the new resampler (which does indeed resample to the stream's sample rate) there was not much of a difference in similar use cases.

Also is there anything else I should change before merging?

Copy link
Collaborator

@arduano arduano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just make sure midis with pitch bends don't hiss. That's the only concern with all this re-sampling

core/src/channel/channel_sf.rs Show resolved Hide resolved
@MyBlackMIDIScore
Copy link
Member Author

MyBlackMIDIScore commented Jul 30, 2024

Actually nevermind, the resampling seems to be faster with rubato so it's all good. I probably remember it being slower because I compared it to BASSMIDI which doesn't do any resampling on load (because master did not support SF2).

Here are the results of some test runs I did for a few soundfonts. I had it load each soundfont with 4 different sample rates so it would do all kinds of resampling. Btw I did wait for my laptop to completely cool down before running each test just in case.


CFaz III (WAV, 44.1kHz)

master

  • SR: 22.05kHz | Loading time: 3.172118066s
  • SR: 44.1kHz | Loading time: 5.779524395s
  • SR: 48kHz | Loading time: 6.81757479s
  • SR: 96kHz | Loading time: 19.844753979s

sf2

  • SR: 22.05kHz | Loading time: 993.998647ms
  • SR: 44.1kHz | Loading time: 1.601153224s
  • SR: 48kHz | Loading time: 1.762764494s
  • SR: 96kHz | Loading time: 3.16798104s

Amethyst 2.0 (FLAC, 48kHz)

master

  • SR: 22.05kHz | Loading time: 2.252381857s
  • SR: 44.1kHz | Loading time: 3.699557793s
  • SR: 48kHz | Loading time: 3.789707062s
  • SR: 96kHz | Loading time: 7.379370278s

sf2

  • SR: 22.05kHz | Loading time: 994.052036ms
  • SR: 44.1kHz | Loading time: 1.359446302s
  • SR: 48kHz | Loading time: 1.381515688s
  • SR: 96kHz | Loading time: 2.130843635s

JV1080 (WAV, 44.1kHz)

master

As SFZ

  • SR: 22.05kHz | Loading time: 161.156333ms
  • SR: 44.1kHz | Loading time: 244.578163ms
  • SR: 48kHz | Loading time: 255.534699ms
  • SR: 96kHz | Loading time: 467.705528ms

As SF2

N/A

sf2

As SFZ

  • SR: 22.05kHz | Loading time: 78.002109ms
  • SR: 44.1kHz | Loading time: 97.624909ms
  • SR: 48kHz | Loading time: 109.28376ms
  • SR: 96kHz | Loading time: 153.549175ms

As SF2

  • SR: 22.05kHz | Loading time: 109.702865ms
  • SR: 44.1kHz | Loading time: 156.615965ms
  • SR: 48kHz | Loading time: 163.969555ms
  • SR: 96kHz | Loading time: 297.897369ms

Aside from these I also ran the render and send_events benches and the results were pretty much the same. It did show 3-5% performance increase in sf2 but I figured that would be due to my laptop heating up and throttling. But even if it was true then that's good lol

Also about the hissing, I personally didn't notice any difference between master and sf2. Will wait until tomorrow morning just in case there is anything else I should do and then I'll merge.

@arduano
Copy link
Collaborator

arduano commented Jul 30, 2024

Everything looks good to me yeah, just overall thank you for implementing then #1 biggest feature request in xsynth lol

Feel free to merge whenever you're happy with it

@MyBlackMIDIScore
Copy link
Member Author

No problem lol, I do enjoy working on this :)
Will merge now

@MyBlackMIDIScore MyBlackMIDIScore merged commit 3385cd0 into master Jul 30, 2024
1 check passed
@arduano
Copy link
Collaborator

arduano commented Jul 31, 2024

Ok I have to go for real now lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants