Skip to content

Commit

Permalink
Allow 1 manitissa bit diff in TestFused8BitRowwiseQuantizationConvers…
Browse files Browse the repository at this point in the history
…ion (pytorch#2015)

Summary:
Pull Request resolved: pytorch#2015

The reference implementation of FP8 quantization is in Python, but the
actual implementation is in C++/CUDA.  Upon summerdengfb's investigation,
Python has a known floating point representation issue
(https://www.geeksforgeeks.org/floating-point-error-in-python/). This
could cause quantization result discrepancy.  To workaround this
issue, we allow 1 bit difference in the FP8 quantization result (LSB
of mantissa) in `TestFused8BitRowwiseQuantizationConversion`.

Reviewed By: q10, shintaro-iwasaki

Differential Revision: D49255499

fbshipit-source-id: b28294f8076bda61589e10699119375f03b091a8
  • Loading branch information
sryap authored and facebook-github-bot committed Sep 14, 2023
1 parent 49058dc commit 45ec826
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion fbgemm_gpu/test/quantize_ops_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,10 @@ def test_quantize_op(
ncols_aligned = (ncols + 4 - 1) // 4 * 4
# compare quantized data
np.testing.assert_allclose(
quantized_data_numpy[:, :ncols], reference[:, :ncols]
quantized_data_numpy[:, :ncols],
reference[:, :ncols],
# Allow 1 mantissa bit difference (LSB)
atol=1,
)
# compare scales
np.testing.assert_array_almost_equal(
Expand Down

0 comments on commit 45ec826

Please sign in to comment.