[WIP] Support for encodings using floating-point values #197

JeromeMartinez · 2020-04-28T10:43:33Z

This pull request expands FFV1 to provide lossless compression to additional pixel formats by adding floating-point values support, and permits the lossless encoding and decoding of the following FFmpeg currently existing floating-point values “pix_fmt” (AV_PIX_FMT_GBRPF32, AV_PIX_FMT_GBRAPF32, AV_PIX_FMT_GRAYF32) as well as their (not yet existing) 16-bit counterparts.
Video formats such as EXR can use floating-point values.

Note about the implementation in FFmpeg: As FFmpeg does not (yet) support 16-bit floating point RGB pixel formats, I plan to send a patch for ffv1dec using exactly the same method of decoding as FFmpeg handles EXR (using of the lossy conversion to integer function from EXR implementation after FFV1 decoding, no encoding), with a decoding message about the lossy decoding from the conversion. The additional complexity is minimal (a test on colorspace_type in order to apply the float to int conversion after decoding).

Note about the YCbCr part: as FFmpeg has AV_PIX_FMT_GRAYF32, I prefer to anticipate the support of such pix_fmt in order to have a coherent specification, by just increasing the colorspace_type value by 2 for each previous colorspace_type values.

Potential optimizations: In practice for 16-bit content not all bits are used (bit 15 is the sign and so always 0 and bit 14 is for more than 1.0 and so always 0); however, in theory it is possible to have values that are negative or greater than 1 (see AllHalfValues.exr description
in https://github.com/AcademySoftwareFoundation/openexr-images/tree/master/TestImages ) so we can not simply omit these bits in the Parameters. Optimization about reducing the bit depth requires more complex changes (similar to how we could reduce Y bit depth to bit_depth instead of bit_depth+1 with colorspace_type of 1, as Only Cb and Cr have a range of bit_depth+1 bits) which would be implemented in version 4. The idea of the implementation in version 3 is to keep the decoder nearly untouched.

This link has some sample files as a proof of concept (the colorspace_type value in the FFV1 bitstream is wrong for both the DPX header and FFV1 bitstream but permits to decode the floating-point numbers from current FFmpeg; FFmpeg patches are hacks for moving the float to int algo from EXR decoder to FFV1 decoder).

michaelni · 2020-04-28T16:13:35Z

Note about the implementation in FFmpeg: As FFmpeg does not (yet) support 16-bit floating point RGB pixel formats, I plan to send a patch for ffv1dec using exactly the same method of decoding as FFmpeg handles EXR (using of the lossy conversion to integer function from EXR implementation after FFV1 decoding, no encoding), with a decoding message about the lossy decoding from the conversion. The additional complexity is minimal (a test on colorspace_type in order to apply the float to int conversion after decoding).

Please dont. Instead add a pixel format. If adding support in some part like swscale is too hard then just skip this but do not add more hacks like a downconvert to 16bit integers

Note about the YCbCr part: as FFmpeg has AV_PIX_FMT_GRAYF32, I prefer to anticipate the support of such pix_fmt in order to have a coherent specification, by just increasing the colorspace_type value by 2 for each previous colorspace_type values.

colorspace_type is the wrong field to indicate float formats
why do you not add a new field ? It will not decode with existing decoders anyway

also i would not drop the "integer" from the transform, whatever is done it needs to be a integer transform else precission and rounding becomes a issue with floats and lossless

JeromeMartinez · 2020-04-28T17:24:26Z

Please dont. Instead add a pixel format.

I was imagining to split the work, in order to reduce the count of changes of the corresponding patch, but I am fine with adding 16-bit float support in FFmpeg at the same time if I don't have to do swcale stuff at the same time, so:

If adding support in some part like swscale is too hard then just skip this

here it makes the patch easier, I was afraid about this task and have my patch rejected if I don't do that, reason I was planning to do as it is in EXR.

why do you not add a new field ?

I don't get that, how? IIUC, adding a new field will make old decoders just skip it, not what we would want. using colorspace_type is exactly for forbidding old decoders to decode the bitstream as integer values.
Please hint about where you would add this new field and at the same time having the bitstream rejected by older decoders.

also i would not drop the "integer" from the transform, whatever is done it needs to be a integer transform else precission and rounding becomes a issue with floats and lossless

The idea is definitely the opposite, I'll keep the word and add a "MUST consider float values as integers" somewhere else.

michaelni · 2020-04-28T18:10:31Z

why do you not add a new field ?

I don't get that, how? IIUC, adding a new field will make old decoders just skip it, not what we would want. using colorspace_type is exactly for forbidding old decoders to decode the bitstream as integer values.
Please hint about where you would add this new field and at the same time having the bitstream rejected by older decoders.

does a version 3 decoder decode it to something meaningfull ?
if not it is not a version 3 feature. So can be added only to later versions.

version 4 is not final yet so any decoder attempting to decode that accepts potential failure.
Anyone implementing a decoder can choose not to decode not yet final versions to avoid attempting to decode a file and then failing.
we could bump versions more often to get finer grained behavior, not sure thats a good idea or not

If you want to add a method of more fine grained "file support" detection thats fine, wouldnt be a bad thing. But please dont hack new features into semantically wrong fields

michaelni · 2020-04-28T19:21:02Z

Potential optimizations: In practice for 16-bit content not all bits are used (bit 15 is the sign and so always 0 and bit 14 is for more than 1.0 and so always 0); however, in theory it is possible to have values that are negative or greater than 1 (see AllHalfValues.exr description
in https://github.com/AcademySoftwareFoundation/openexr-images/tree/master/TestImages ) so we can not simply omit these bits in the Parameters.

Instead of one float flag in the header we could add fields that specify the number of bits in the mantisse, if theres a sign bit and the range of exponents (larger than 1 and how much detail around 0). This would be a superset for single and double precission IEEE floats. And should also improve speed as fewer "always 0" planes would be stored

retokromer · 2020-04-29T09:29:35Z

I strongly support the idea to implement floating-point formats in version 4 only, avoiding to try a “hack” for version 3, and to do it consistently from scratch.

michaelni · 2024-02-11T13:29:48Z

Any update on this ?
I can maybe work on this if noone else has

richardpl · 2024-02-11T13:55:34Z

16bit float to 32bit float in exr decoder with tables is not lossy, its lossless - no data is lost.

JeromeMartinez · 2024-02-11T14:14:31Z

I don't have any specific update on this, on our side we went to an awful but working hack using EXR 16-bit float as integer and signaling EXR 16-bit float with a side channel (compression is in practice like 16 bit integer, average of 50% compression), but still interested in moving to something less hack.

In my opinion the discussion is more about how to signal the pix_fmt and maybe avoiding to compress only 0 bit planes rather than having a complex and new path for compressing and decompressing. It could also be reused for integers as sometimes some bits are 0 padding or very dark (so highest bit always 0 in a slice).

State of my thoughts about a superset of changes for v4:

Keep SliceContent spec untouched, no additional complexity in this part without any argument that it would help compression specifically for float.
Consider the bit depth in ConfigurationRecord as the maximum bit depth, and just integer/float flag (or more precise with mantissa bit count or any other tip about mapping of 16 bit values to 32 bit values etc), for the decoded output configuration
Whatever is the type (so including integer, no specific code depending of the pixel type, we keep the bistream simple), tip in SliceHeader about the count of higher and lower bits with 0 in all pixels, and no compression of theses bits.

Rationale is that we don't know in advance the content which could be e.g. negative or more than 1 in only one frame at the end, so we don't prevent this possibility at encoder and decoder init but we permit speed optimization by limiting the count of bits managed by the range coder, on the decoder side it is only an extra bit shift in the case of lower bits having only 0 and nothing for higher bits.
In practice for float it permits a 15-bit (or less) encoding for 99.999% of frames (the ones with non negative content which are so uncommon that it is not useful to optimize for them) so using 16-bit after RCT (no need of 32-bit intermediate storage) and sometimes it may also help for integers (sometimes 16-bit intermediate storage only, possibility to have the range coder to "overflow" so sometimes better compression, to be demonstrated).

What other optimizations do you see for this topic?

michaelni · 2024-02-11T17:31:48Z

In my opinion the discussion is more about how to signal the pix_fmt and maybe avoiding to compress only 0 bit planes rather than having a complex and new path for compressing and decompressing. It could also be reused for integers as sometimes some bits are 0 padding or very dark (so highest bit always 0 in a slice).

I think there are 3 different things here.

signaling floats (16,32,64)
improvments in how symbols are compressed which may apply to both integer or floats
a totally new coder for floats

we can do 1+2 and treat each independantly or look at 3 first and then decide

Rationale is that we don't know in advance the content which could be e.g. negative or more than 1 in only one frame at the end, so we don't prevent this possibility at encoder and decoder init but we permit speed optimization by limiting the count of bits managed by the range coder, on the decoder side it is only an extra bit shift in the case of lower bits having only 0 and nothing for higher bits. In practice for float it permits a 15-bit (or less) encoding for 99.999% of frames (the ones with non negative content which are so uncommon that it is not useful to optimize for them) so using 16-bit after RCT (no need of 32-bit intermediate storage) and sometimes it may also help for integers (sometimes 16-bit intermediate storage only, possibility to have the range coder to "overflow" so sometimes better compression, to be demonstrated).

What other optimizations do you see for this topic?

we have at least 3 things.

RCT
quantization
predictor

All 3 are wrong for floats, in the sense of not being "homomorphic" That is if you take a few integers and a few floats that are equivalent in some sense then these opertions do not do the same thing to both.
You can see this if you consider that the predictor will always increase by x if all its inputs are offset by x in integers. But you will not see this effect in floats. It will be more chaotic.
We should test the correct corresponding operations to better understand their performance difference before simply using the "wrong" integer ones

richardpl · 2024-02-11T19:35:36Z

Treating floats as integers when compressing? I doubt one can get any big compression gain that way, at least for audio case it is bad...

JeromeMartinez · 2024-02-12T22:53:46Z

Treating floats as integers when compressing? I doubt one can get any big compression gain that way, at least for audio case it is bad...

I can not share the files but real use case by a RAWcooked user (non relevant things removed, same FFV1 config with v3 and 576 slices):

$ rawcooked 00000.exr
$ rawcooked 00000.exr.mkv
$ lzma --keep --extreme 00000.exr
$ ffmpeg -i 00000.exr -c:v ffv1 -slices 576 00000.rgb48.mkv
$ ls -l
49801411 00000.exr
20294634 00000.exr.mkv
23972454 00000.exr.lzma
49801411 00000.RAWcooked.exr
29595221 00000.rgb48.mkv

00000.exr.mkv is with a hack handling float as int then FFV1
00000.rgb48.mkv is FFmpeg (lossy) converting float to int then FFV1, for example about how it is encoded if integer (I know, not same content, just saying that I would not like to convert to int then compress...)
00000.RAWcooked.exr is the reverting process to EXR from FFV1 by RAWcooked and is bit-by-bit identical to 00000.exr.
00000.exr.lzma is for reference about what can do a (very slow) generic compression with one of the best compressors, usually FFV1 does with 10 or 16 bit integer content something a bit smaller than this compression, so FFV1 behavior with float as int is really similar to what it does with int.

TLDR, FFV1 compresses this file by 60%! More generally we have an average compression ratio like that, better than our 16-bit int (easy, lot of MSB at 0... But still good!) which is ~50% compression.

Users appreciate a lot to have this compression ratio and prefer to have this one rather than storing EXR files as is, and I have doubt we could really do a lot better without a lot of changes in FFV1, current issue is not the compression ratio but the fact that there is no standard signaling of float.

michaelni · 2024-02-12T23:16:48Z

Treating floats as integers when compressing? I doubt one can get any big compression gain that way, at least for audio case it is bad...

I can not share the files but real use case by a RAWcooked user (non relevant things removed, same FFV1 config with v3 and 576 slices):

I think we should switch to files that can be shared.

00000.exr.mkv is with a hack handling float as int then FFV1 00000.rgb48.mkv is FFmpeg (lossy) converting float to int then FFV1, for example about how it is encoded if integer (I know, not same content, just saying that I would not like to convert to int then compress...) 00000.RAWcooked.exr is the reverting process to EXR from FFV1 by RAWcooked and is bit-by-bit identical to 00000.exr. 00000.exr.lzma is for reference about what can do a (very slow) generic compression with one of the best compressors, usually FFV1 does with 10 or 16 bit integer content something a bit smaller than this compression, so FFV1 behavior with float as int is really similar to what it does with int.

TLDR, FFV1 compresses this file by 60%! More generally we have an average compression ratio like that, better than our 16-bit int (easy, lot of MSB at 0... But still good!) which is ~50% compression.

Users appreciate a lot to have this compression ratio and prefer to have this one rather than storing EXR files as is, and I have doubt we could really do a lot better without a lot of changes in FFV1, current issue is not the compression ratio but the fact that there is no standard signaling of float.

Theres a chance FFv1 maintaince work this year and especially development of float support will be funded. If thats the case i intend to investigate more completely how to optimally handle floats. The variant of simply treating them as integers isnt bad and i suggest we support that too as it adds 0 complexity but i agree with paul that it should be possible to do better than that.

richardpl · 2024-02-13T08:36:50Z

Are these real 32bit float EXR files - with natural (camera footage) content and one with synthetic (blender rendered ones) non trivial content? EXR have just bad lossless 32-bit float compressions IIRC.

If current/future coder in FFv1 can make extra reductions with mantissa and exp bits (with no need for separate coding of two of them) that would be major win.

JeromeMartinez · 2024-02-13T08:53:23Z

I think we should switch to files that can be shared.

I wish I have that... And I am interested in such files, because it seems very hard to get such content, I developed my hack "blind" and it was enough for my needs.

In the meantime, 16-bit float non real use cases e.g.:

filesamples.com, -44% with float, -40% lossy conversion to int, -58% with lzma (well... It is a synthetical picture... Relatively classic, and the gain there is good to have but it is so slow...).
ACES_ODT_SampleFrames has ~100 different frames with different content but seems all not real shooting, -33%.

FYI tests are made with this ugly patch for FFmpeg and the lossless compression is confirmed when the added option is used.

as it adds 0 complexity but i agree with paul that it should be possible to do better than that.

It would be great if there is a demonstration that the additional complexity is worth it, I like FFV1 also because of its "low" complexity (very small code size compared to some other lossless formats).

retokromer · 2024-02-13T09:35:56Z

I used as a starting point: https://openexr.com/en/latest/_test_images/index.html

Yet I will check if some of our clients are willing to share publicly examples.

Support for encodings using floating-point values

6eb0c1a

JeromeMartinez changed the title ~~Support for encodings using floating-point values~~ [WIP] Support for encodings using floating-point values May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support for encodings using floating-point values #197

[WIP] Support for encodings using floating-point values #197

JeromeMartinez commented Apr 28, 2020

michaelni commented Apr 28, 2020

JeromeMartinez commented Apr 28, 2020

michaelni commented Apr 28, 2020

michaelni commented Apr 28, 2020

retokromer commented Apr 29, 2020

michaelni commented Feb 11, 2024

richardpl commented Feb 11, 2024

JeromeMartinez commented Feb 11, 2024

michaelni commented Feb 11, 2024

richardpl commented Feb 11, 2024

JeromeMartinez commented Feb 12, 2024

michaelni commented Feb 12, 2024

richardpl commented Feb 13, 2024

JeromeMartinez commented Feb 13, 2024

retokromer commented Feb 13, 2024 •

edited

Loading

[WIP] Support for encodings using floating-point values #197

Are you sure you want to change the base?

[WIP] Support for encodings using floating-point values #197

Conversation

JeromeMartinez commented Apr 28, 2020

michaelni commented Apr 28, 2020

JeromeMartinez commented Apr 28, 2020

michaelni commented Apr 28, 2020

michaelni commented Apr 28, 2020

retokromer commented Apr 29, 2020

michaelni commented Feb 11, 2024

richardpl commented Feb 11, 2024

JeromeMartinez commented Feb 11, 2024

michaelni commented Feb 11, 2024

richardpl commented Feb 11, 2024

JeromeMartinez commented Feb 12, 2024

michaelni commented Feb 12, 2024

richardpl commented Feb 13, 2024

JeromeMartinez commented Feb 13, 2024

retokromer commented Feb 13, 2024 • edited Loading

retokromer commented Feb 13, 2024 •

edited

Loading