Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wav: add ambisonic bformat subtypes #263

Merged
merged 1 commit into from
Mar 2, 2024
Merged

Conversation

aentity
Copy link
Contributor

@aentity aentity commented Feb 18, 2024

Example ambisonic wav files can be found here: https://www.ambisonia.com/

@pdeljanov
Copy link
Owner

Hi @aentity,

Thanks for this.

The symphonia-format-wav crate has been deprecated and will be removed permanently in 0.6.0. The new crate is symphonia-format-riff. This new crate is a superset of the the old one so you would need to make these same changes to it as well.

Aside from that, wouldn't this need a decoder to decode the ambisonics into proper PCM channels?

@aentity
Copy link
Contributor Author

aentity commented Feb 28, 2024

@pdeljanov hello.

i am happy to add these changes to riff. i don't know or use your library (yet) :)) but i just pushed these at same time as i did the hound push. ruuda/hound#72

so for your question, it is possible i have made an answer in that thread, but it is a little complicated. so if by PCM channel you mean the channel in the wave file, most ambisonic wav files you encounter are 4-channel wav files (with .amb extension) with these guids i added. the extension and the guid are to tell users that it is 4 channels or listenable sound (you can listen, it just sounds a little strange), but each sample across channels (a b-format sample) needs further processing (when it comes to listening).

as said this format is called the b-format. it is speaker layout independent. so this means the raw b-format must be decoded to a specific speaker target, perhaps binaural stereo, regular stereo, quadrophonic speakers, spherical, 5.1 surround, or any other thing humans can imagine.

however, as i said, i do not know how user reads in symphonia, how channels are represented, etc., but as a user of the library, i would expect to be able to read in a .wav or .amb file, collect the 4 separate channel samples into a b-format sample (usually represented as w, x, y, z), which can be either integral pcm or float.

in an ambisonic pipeline, we can now perform manipulations on the signal, or decode it to a speaker layout.

so yes, the user needs a decoder to transform the raw b-format into something the listener listens.

i think this is out of scope for symphonia? i do not know. i am working on decoder implementations right now for another project. but it needs to have understanding of bformats, decoder types, complicated math like pseudo inverse, and other 3dimensional manipulations (if desire).

@aentity
Copy link
Contributor Author

aentity commented Feb 28, 2024

i have added the types to riff, thank you for telling.

i also should mention, in future, it would be nice to know that user is reading a file with bformat guid. is there a way to retrieve this tag information somehow programmatically?

@pdeljanov
Copy link
Owner

pdeljanov commented Feb 29, 2024

Hi @aentity,

Thanks for the explanation.

The current set of changes will be sufficient for you to access the 4 channels, however, there are some areas where support could be improved. This PR is probably good enough for now, but I'll list them below for future consideration:

  1. The w, x, y, z channels will either be assigned the speaker positions designated in the channel mask of the WAVE header, or be designated as front left, front right, front center, and LFE (the first 4 speaker positions per the WAVE standard) if no mask is provided. This is because Symphonia 0.5.x doesn't have "unpositioned" channels (i.e., all channels must be assigned a speaker position). Symphonia 0.6 will add support for unpositioned channels which I believe is the more technically correct solution in this case because the decoder will synthesize the real channels. I have a design prototyped for this that I'll be collecting some feedback on soon. Perhaps you could take a look when it is ready.
  2. Without making any other changes, a PCM codec type will assigned to the codec parameters of the audio track. This means that the PCM decoder will be used for decoding. However, the PCM decoder doesn't know it has to perform additional work to synthesize the audio. It also wouldn't be appropriate to implement this synthesis in the PCM decoder. What could be done is assign an ambisonics PCM codec type when these GUIDs are encountered, and then implement a Decoder for the ambisonics codec types. You could then implement the calculations required to synthesize the actual channels in this decoder. This would be the proper solution within the Symphonia framework.

i also should mention, in future, it would be nice to know that user is reading a file with bformat guid. is there a way to retrieve this tag information somehow programmatically?

There is no way currently as of 0.5.4. As mentioned in 2, assigning a codec type of ambisonics could be one way of detecting this.

@pdeljanov
Copy link
Owner

Another option could be introducing ambisonic channels. This may generalize better across different formats. For example, AAC compressed ambisonic audio channels in MP4.

@aentity
Copy link
Contributor Author

aentity commented Feb 29, 2024

hello. i don't understand the format CI failure; i had to turn off format-on-save, which is small annoying. i have repushed though and should be ready, thank you!

some response:

Symphonia 0.6 will add support for unpositioned channels which I believe is the more technically correct solution in this case because the decoder will synthesize the real channels. I have a design prototyped for this that I'll be collecting some feedback on soon. Perhaps you could take a look when it is ready.

yes sounds more correct. and yes, please do ping me for design, but please note, i am not familiar with your library details :)

However, the PCM decoder doesn't know it has to perform additional work to synthesize the audio. It also wouldn't be appropriate to implement this synthesis in the PCM decoder. What could be done is assign an ambisonics PCM codec type when these GUIDs are encountered, and then implement a Decoder for the ambisonics codec types. You could then implement the calculations required to synthesize the actual channels in this decoder. This would be the proper solution within the Symphonia framework.

i see. this differs from simpler libraries like hound, which does not have real notion of codec like this. (e.g., i am writing something like this). i am not sure how far you want to take symphonia responsibility there. i think it may draw in extra dependencies for you, but it could be intersting. i do not know.

i am just used to reading file, extracting interleaved channels, representing as i want, and then performing operations, for example.

one thing i must stress:

You could then implement the calculations required to synthesize the actual channels in this decoder

one purpose of ambisonic b-format is that it is useful to perform manipulation on this raw format, transforms, mixing, etc., and then at very end, decode into real speaker layout. it is more of a late stage plugin. as they say, it is 'speaker agnostic'.

similarly, there is inverse, where user wants to encode into b-format perhaps a mono signal, (or from a-format, which is what microphones pickup usually, but this is another story), to be mixed with other b-format signals.

so what i say by all this, maybe (if this is proposed) forcing user to pick the decoder target (speaker layout) on reading of a file is premature. it is nice to work in the "b-format space" until very end, then decode to the layout. so if reading a .wav or .amb file in symphonia forces to immediately pick a decoder layout, this might not be optimal design (not flexible enough), if you understand my point.

ok sorry for long text, thank you!

@pdeljanov
Copy link
Owner

Hi @aentity,

Once again, thanks for your detailed explanations.

Quite a bit to consider on my side. I'll see what changes we can incorporate into the 0.6 API to support this use-case better. I'm leaning towards adding an Ambisonic channel map. I'll make sure to @ you when I'm collecting feedback on the audio module rewrite changes.

For now, I'll merge this PR as-is. Thanks!

hello. i don't understand the format CI failure; i had to turn off format-on-save, which is small annoying. i have repushed though and should be ready, thank you!

A nightly toolchain is required for rustfmt to support the brace style we use.

@pdeljanov pdeljanov merged commit 335d960 into pdeljanov:master Mar 2, 2024
11 checks passed
@aentity aentity deleted the ambisonic branch March 6, 2024 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants