You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
XMConvertHalfToFloat and XMConvertFloatToHalf both use a large number of integer ops when F16 intrinsics aren't available. It may be faster to do it with floating point operations. XMConvertHalfToFloat has a while loop for denormals, which is particularly slow.
Float-to-half conversion can use a trick: For positive numbers, (f + max(f, 2^-24)) will produce a float with an exponent at a fixed bias from the half float, and handle denormals and zero, and only needs 2 ops. (Bit-exactness in this case is sensitive to handling of the dropped mantissa bits in the denormal case though.)
Half-to-float can handle denormals (and zero) by converting the mantissa to float and multiplying it by 2^-24, which should be faster than the loop.
The text was updated successfully, but these errors were encountered:
XMConvertHalfToFloat and XMConvertFloatToHalf both use a large number of integer ops when F16 intrinsics aren't available. It may be faster to do it with floating point operations. XMConvertHalfToFloat has a while loop for denormals, which is particularly slow.
Float-to-half conversion can use a trick: For positive numbers, (f + max(f, 2^-24)) will produce a float with an exponent at a fixed bias from the half float, and handle denormals and zero, and only needs 2 ops. (Bit-exactness in this case is sensitive to handling of the dropped mantissa bits in the denormal case though.)
Half-to-float can handle denormals (and zero) by converting the mantissa to float and multiplying it by 2^-24, which should be faster than the loop.
The text was updated successfully, but these errors were encountered: