Allow 1 manitissa bit diff in TestFused8BitRowwiseQuantizationConvers…

…ion (pytorch#2015) Summary: Pull Request resolved: pytorch#2015 The reference implementation of FP8 quantization is in Python, but the actual implementation is in C++/CUDA. Upon summerdengfb's investigation, Python has a known floating point representation issue (https://www.geeksforgeeks.org/floating-point-error-in-python/). This could cause quantization result discrepancy. To workaround this issue, we allow 1 bit difference in the FP8 quantization result (LSB of mantissa) in `TestFused8BitRowwiseQuantizationConversion`. Reviewed By: q10, shintaro-iwasaki Differential Revision: D49255499 fbshipit-source-id: b28294f8076bda61589e10699119375f03b091a8
q10 · Sep 14, 2023 · 45ec826 · 45ec826
1 parent 49058dc
commit 45ec826
Showing 1 changed file with 4 additions and 1 deletion.
diff --git a/fbgemm_gpu/test/quantize_ops_test.py b/fbgemm_gpu/test/quantize_ops_test.py
@@ -118,7 +118,10 @@ def test_quantize_op(
             ncols_aligned = (ncols + 4 - 1) // 4 * 4
             # compare quantized data
             np.testing.assert_allclose(
-                quantized_data_numpy[:, :ncols], reference[:, :ncols]
+                quantized_data_numpy[:, :ncols],
+                reference[:, :ncols],
+                # Allow 1 mantissa bit difference (LSB)
+                atol=1,
             )
             # compare scales
             np.testing.assert_array_almost_equal(