Possible performance improvements to half float conversion #76

elasota · 2018-08-18T07:26:22Z

XMConvertHalfToFloat and XMConvertFloatToHalf both use a large number of integer ops when F16 intrinsics aren't available. It may be faster to do it with floating point operations. XMConvertHalfToFloat has a while loop for denormals, which is particularly slow.

Float-to-half conversion can use a trick: For positive numbers, (f + max(f, 2^-24)) will produce a float with an exponent at a fixed bias from the half float, and handle denormals and zero, and only needs 2 ops. (Bit-exactness in this case is sensitive to handling of the dropped mantissa bits in the denormal case though.)

Half-to-float can handle denormals (and zero) by converting the mantissa to float and multiplying it by 2^-24, which should be faster than the loop.

walbourn added the optimization label May 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible performance improvements to half float conversion #76

Possible performance improvements to half float conversion #76

elasota commented Aug 18, 2018

Possible performance improvements to half float conversion #76

Possible performance improvements to half float conversion #76

Comments

elasota commented Aug 18, 2018