Handling of NaN and infinity #119

josevalim · 2020-12-28T18:30:25Z

Today Nx operations fail if they find a NaN and/or Infinity (although defn behaviour will be compiler independent). Do we need to implement handling of NaN and infinity within Nx? What are the use cases?

jackalcooper · 2021-01-03T03:39:24Z

One use case: dynamic scale. From TF docs:

Dynamic loss scaling works by adjusting the loss scale as training progresses. The goal is to keep the loss scale as high as possible without overflowing the gradients. As long as the gradients do not overflow, raising the loss scale never hurts.
The algorithm starts by setting the loss scale to an initial value. Every N steps that the gradients are finite, the loss scale is increased by some factor. However, if a NaN or Inf gradient is found, the gradients for that step are not applied, and the loss scale is decreased by the factor. This process tends to keep the loss scale as high as possible without gradients overflowing.

https://www.tensorflow.org/api_docs/python/tf/mixed_precision/experimental/DynamicLossScale

seanmor5 · 2021-01-22T13:38:47Z

I think one option for this is to add a check_finite option which calls an element-wise is_finite to make sure each element in the input is not Infinity/NaN and raise otherwise. If the check is enabled, obviously there's a performance cost, but it might make debugging easier.

Obviously some functions will return infinity/NaN on certain inputs so we'll have to handle those explicitly. If the check isn't enabled, we can let it fail (I don't think this is ideal), or treat it as a no-op, unless the operation has a meaningful result for infinity/NaN.

Adding support from Elixir I think we would just have to adjust some of the functions for reading/writing scalars to binaries.

seanmor5 · 2021-01-22T13:44:46Z

Here's MATLAB's take: https://www.mathworks.com/help/matlab/matlab_prog/infinity-and-nan.html

seanmor5 · 2021-01-23T01:03:46Z

And another related feature is PyTorch gradient anomaly detection: https://pytorch.org/docs/stable/autograd.html#torch.autograd.detect_anomaly

Raises on any errors (such as NaN) in gradient calculations

ondrej-tucek · 2021-02-17T15:57:35Z

What_ are the use cases?

In my humble opinion

definitely it would be valuable from numerics aspect. I mean, there are a plenty of numerical methods and algorithms which compute something with given precision. So if I'd start to calculate some mathematical problem, lets say with float32 precision, I'd also expect that result would be a value with float32 precision or an error message, e.g. {:ok, 2.71} | {:error, "+Inf"}.
IoT, thanks to Nerves project we can read a measured data from many types of sensors or can control devices (e.g. pressure valve, etc.). Each sensor's manufacturer (similar for control device) has own datasheet for given sensor where are written its limitations (e.g. raw binary range, endianness, service life,...) and howto use it. For example, according to datasheet you should convert a measured raw binary data to float32. And if Nx would have an implementation of convert function (binary to float32) as result type (i.e. {:ok, val} | {:error, msg}) then we can easily see that there is an issue with sensor, catch the error, send message to user,...

we would just have to adjust some of the functions for reading/writing scalars to binaries.

I think that's good idea. I haven't look at on the core of Nx so deeply yet, thus I don't know how complicated can be it in your case. In our, it was simple an implementation of IEEE754 for single and double conversion.

polvalente · 2022-01-06T23:15:07Z

Commenting here so we can keep track of this all on the same place:

@dkuku found a match error in a Neural Net example which turned out to be related to Infinity. Minimal examples which fail like this on both f32 and f64:

iex> Nx.tensor([1.0e32]) |> Nx.power(2) |> Nx.add(Nx.tensor([1]))
** (MatchError) no match of right hand side value: <<0, 0, 128, 127>>
    (nx 0.1.0) lib/nx/binary_backend.ex:632: anonymous fn/10 in Nx.BinaryBackend.element_wise_bin_op/4
    (elixir 1.13.0) lib/enum.ex:4136: Enum.reduce_range/5
    (nx 0.1.0) lib/nx/binary_backend.ex:628: Nx.BinaryBackend.element_wise_bin_op/4

iex(2)> Nx.tensor([1.0e32], type: {:f, 64}) |> Nx.power(9) |> Nx.add(Nx.tensor([1.0e288]))
** (MatchError) no match of right hand side value: <<0, 0, 128, 127>>
    (nx 0.1.0) lib/nx/binary_backend.ex:639: anonymous fn/10 in Nx.BinaryBackend.element_wise_bin_op/4
    (elixir 1.13.0) lib/enum.ex:4136: Enum.reduce_range/5
    (nx 0.1.0) lib/nx/binary_backend.ex:628: Nx.BinaryBackend.element_wise_bin_op/4

tiagodavi · 2022-01-26T12:58:19Z

It's similar the error I am facing here:
#614

polvalente · 2022-06-20T19:03:16Z

Support for NaN, negative and positive infinity was added during addition of support for complex numbers. Some outstanding functions persist in #792 and #793, but I think this issue can be closed in favor of the specific ones

seanmor5 mentioned this issue Jan 11, 2021

Support complex types #118

Closed

josevalim added the kind:feature New feature or request label Jan 23, 2021

josevalim added the area:nx Applies to nx label Feb 12, 2021

seanmor5 mentioned this issue Feb 19, 2021

Infix dot operator #236

Closed

josevalim mentioned this issue Feb 28, 2021

Nx.Cholesky/1 raises ArithmeticError on odd dimensions symmetric square matrix #288

Closed

josevalim mentioned this issue May 9, 2021

Fix p-norm throwing ArithmeticError for zero vectors #387

Merged

polvalente mentioned this issue Feb 13, 2022

(MatchError) no match of right hand side value: <<195, 154, 29, 117>> #641

Closed

polvalente closed this as completed Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of NaN and infinity #119

Handling of NaN and infinity #119

josevalim commented Dec 28, 2020

jackalcooper commented Jan 3, 2021

seanmor5 commented Jan 22, 2021

seanmor5 commented Jan 22, 2021

seanmor5 commented Jan 23, 2021

ondrej-tucek commented Feb 17, 2021

polvalente commented Jan 6, 2022 •

edited

Loading

tiagodavi commented Jan 26, 2022

polvalente commented Jun 20, 2022

Handling of NaN and infinity #119

Handling of NaN and infinity #119

Comments

josevalim commented Dec 28, 2020

jackalcooper commented Jan 3, 2021

seanmor5 commented Jan 22, 2021

seanmor5 commented Jan 22, 2021

seanmor5 commented Jan 23, 2021

ondrej-tucek commented Feb 17, 2021

polvalente commented Jan 6, 2022 • edited Loading

tiagodavi commented Jan 26, 2022

polvalente commented Jun 20, 2022

polvalente commented Jan 6, 2022 •

edited

Loading