wav: add ambisonic bformat subtypes #263

aentity · 2024-02-18T03:45:26Z

Example ambisonic wav files can be found here: https://www.ambisonia.com/

pdeljanov · 2024-02-28T02:10:22Z

Thanks for this.

The symphonia-format-wav crate has been deprecated and will be removed permanently in 0.6.0. The new crate is symphonia-format-riff. This new crate is a superset of the the old one so you would need to make these same changes to it as well.

Aside from that, wouldn't this need a decoder to decode the ambisonics into proper PCM channels?

aentity · 2024-02-28T04:03:25Z

@pdeljanov hello.

i am happy to add these changes to riff. i don't know or use your library (yet) :)) but i just pushed these at same time as i did the hound push. ruuda/hound#72

so for your question, it is possible i have made an answer in that thread, but it is a little complicated. so if by PCM channel you mean the channel in the wave file, most ambisonic wav files you encounter are 4-channel wav files (with .amb extension) with these guids i added. the extension and the guid are to tell users that it is 4 channels or listenable sound (you can listen, it just sounds a little strange), but each sample across channels (a b-format sample) needs further processing (when it comes to listening).

as said this format is called the b-format. it is speaker layout independent. so this means the raw b-format must be decoded to a specific speaker target, perhaps binaural stereo, regular stereo, quadrophonic speakers, spherical, 5.1 surround, or any other thing humans can imagine.

however, as i said, i do not know how user reads in symphonia, how channels are represented, etc., but as a user of the library, i would expect to be able to read in a .wav or .amb file, collect the 4 separate channel samples into a b-format sample (usually represented as w, x, y, z), which can be either integral pcm or float.

in an ambisonic pipeline, we can now perform manipulations on the signal, or decode it to a speaker layout.

so yes, the user needs a decoder to transform the raw b-format into something the listener listens.

i think this is out of scope for symphonia? i do not know. i am working on decoder implementations right now for another project. but it needs to have understanding of bformats, decoder types, complicated math like pseudo inverse, and other 3dimensional manipulations (if desire).

aentity · 2024-02-28T04:11:29Z

i have added the types to riff, thank you for telling.

i also should mention, in future, it would be nice to know that user is reading a file with bformat guid. is there a way to retrieve this tag information somehow programmatically?

pdeljanov · 2024-02-29T03:14:24Z

Hi @aentity,

Thanks for the explanation.

The current set of changes will be sufficient for you to access the 4 channels, however, there are some areas where support could be improved. This PR is probably good enough for now, but I'll list them below for future consideration:

The w, x, y, z channels will either be assigned the speaker positions designated in the channel mask of the WAVE header, or be designated as front left, front right, front center, and LFE (the first 4 speaker positions per the WAVE standard) if no mask is provided. This is because Symphonia 0.5.x doesn't have "unpositioned" channels (i.e., all channels must be assigned a speaker position). Symphonia 0.6 will add support for unpositioned channels which I believe is the more technically correct solution in this case because the decoder will synthesize the real channels. I have a design prototyped for this that I'll be collecting some feedback on soon. Perhaps you could take a look when it is ready.
Without making any other changes, a PCM codec type will assigned to the codec parameters of the audio track. This means that the PCM decoder will be used for decoding. However, the PCM decoder doesn't know it has to perform additional work to synthesize the audio. It also wouldn't be appropriate to implement this synthesis in the PCM decoder. What could be done is assign an ambisonics PCM codec type when these GUIDs are encountered, and then implement a Decoder for the ambisonics codec types. You could then implement the calculations required to synthesize the actual channels in this decoder. This would be the proper solution within the Symphonia framework.

i also should mention, in future, it would be nice to know that user is reading a file with bformat guid. is there a way to retrieve this tag information somehow programmatically?

There is no way currently as of 0.5.4. As mentioned in 2, assigning a codec type of ambisonics could be one way of detecting this.

pdeljanov · 2024-02-29T03:27:50Z

Another option could be introducing ambisonic channels. This may generalize better across different formats. For example, AAC compressed ambisonic audio channels in MP4.

aentity · 2024-02-29T04:02:57Z

hello. i don't understand the format CI failure; i had to turn off format-on-save, which is small annoying. i have repushed though and should be ready, thank you!

some response:

Symphonia 0.6 will add support for unpositioned channels which I believe is the more technically correct solution in this case because the decoder will synthesize the real channels. I have a design prototyped for this that I'll be collecting some feedback on soon. Perhaps you could take a look when it is ready.

yes sounds more correct. and yes, please do ping me for design, but please note, i am not familiar with your library details :)

However, the PCM decoder doesn't know it has to perform additional work to synthesize the audio. It also wouldn't be appropriate to implement this synthesis in the PCM decoder. What could be done is assign an ambisonics PCM codec type when these GUIDs are encountered, and then implement a Decoder for the ambisonics codec types. You could then implement the calculations required to synthesize the actual channels in this decoder. This would be the proper solution within the Symphonia framework.

i see. this differs from simpler libraries like hound, which does not have real notion of codec like this. (e.g., i am writing something like this). i am not sure how far you want to take symphonia responsibility there. i think it may draw in extra dependencies for you, but it could be intersting. i do not know.

i am just used to reading file, extracting interleaved channels, representing as i want, and then performing operations, for example.

one thing i must stress:

You could then implement the calculations required to synthesize the actual channels in this decoder

one purpose of ambisonic b-format is that it is useful to perform manipulation on this raw format, transforms, mixing, etc., and then at very end, decode into real speaker layout. it is more of a late stage plugin. as they say, it is 'speaker agnostic'.

similarly, there is inverse, where user wants to encode into b-format perhaps a mono signal, (or from a-format, which is what microphones pickup usually, but this is another story), to be mixed with other b-format signals.

so what i say by all this, maybe (if this is proposed) forcing user to pick the decoder target (speaker layout) on reading of a file is premature. it is nice to work in the "b-format space" until very end, then decode to the layout. so if reading a .wav or .amb file in symphonia forces to immediately pick a decoder layout, this might not be optimal design (not flexible enough), if you understand my point.

ok sorry for long text, thank you!

pdeljanov · 2024-03-02T16:55:05Z

Hi @aentity,

Once again, thanks for your detailed explanations.

Quite a bit to consider on my side. I'll see what changes we can incorporate into the 0.6 API to support this use-case better. I'm leaning towards adding an Ambisonic channel map. I'll make sure to @ you when I'm collecting feedback on the audio module rewrite changes.

For now, I'll merge this PR as-is. Thanks!

hello. i don't understand the format CI failure; i had to turn off format-on-save, which is small annoying. i have repushed though and should be ready, thank you!

A nightly toolchain is required for rustfmt to support the brace style we use.

aentity force-pushed the ambisonic branch from efe9b0f to dc6b1a3 Compare February 28, 2024 04:10

wav: add ambisonic bformat subtypes

19619ef

aentity force-pushed the ambisonic branch from dc6b1a3 to 19619ef Compare February 29, 2024 03:52

pdeljanov merged commit 335d960 into pdeljanov:master Mar 2, 2024
11 checks passed

aentity deleted the ambisonic branch March 6, 2024 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wav: add ambisonic bformat subtypes #263

wav: add ambisonic bformat subtypes #263

aentity commented Feb 18, 2024

pdeljanov commented Feb 28, 2024

aentity commented Feb 28, 2024

aentity commented Feb 28, 2024

pdeljanov commented Feb 29, 2024 •

edited

Loading

pdeljanov commented Feb 29, 2024

aentity commented Feb 29, 2024

pdeljanov commented Mar 2, 2024

wav: add ambisonic bformat subtypes #263

wav: add ambisonic bformat subtypes #263

Conversation

aentity commented Feb 18, 2024

pdeljanov commented Feb 28, 2024

aentity commented Feb 28, 2024

aentity commented Feb 28, 2024

pdeljanov commented Feb 29, 2024 • edited Loading

pdeljanov commented Feb 29, 2024

aentity commented Feb 29, 2024

pdeljanov commented Mar 2, 2024

pdeljanov commented Feb 29, 2024 •

edited

Loading