Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of NaN and infinity #119

Closed
josevalim opened this issue Dec 28, 2020 · 8 comments
Closed

Handling of NaN and infinity #119

josevalim opened this issue Dec 28, 2020 · 8 comments
Labels
area:nx Applies to nx kind:feature New feature or request

Comments

@josevalim
Copy link
Collaborator

Today Nx operations fail if they find a NaN and/or Infinity (although defn behaviour will be compiler independent). Do we need to implement handling of NaN and infinity within Nx? What are the use cases?

@jackalcooper
Copy link
Collaborator

One use case: dynamic scale. From TF docs:

Dynamic loss scaling works by adjusting the loss scale as training progresses. The goal is to keep the loss scale as high as possible without overflowing the gradients. As long as the gradients do not overflow, raising the loss scale never hurts.
The algorithm starts by setting the loss scale to an initial value. Every N steps that the gradients are finite, the loss scale is increased by some factor. However, if a NaN or Inf gradient is found, the gradients for that step are not applied, and the loss scale is decreased by the factor. This process tends to keep the loss scale as high as possible without gradients overflowing.

https://www.tensorflow.org/api_docs/python/tf/mixed_precision/experimental/DynamicLossScale

@seanmor5
Copy link
Collaborator

I think one option for this is to add a check_finite option which calls an element-wise is_finite to make sure each element in the input is not Infinity/NaN and raise otherwise. If the check is enabled, obviously there's a performance cost, but it might make debugging easier.

Obviously some functions will return infinity/NaN on certain inputs so we'll have to handle those explicitly. If the check isn't enabled, we can let it fail (I don't think this is ideal), or treat it as a no-op, unless the operation has a meaningful result for infinity/NaN.

Adding support from Elixir I think we would just have to adjust some of the functions for reading/writing scalars to binaries.

@seanmor5
Copy link
Collaborator

@seanmor5
Copy link
Collaborator

And another related feature is PyTorch gradient anomaly detection: https://pytorch.org/docs/stable/autograd.html#torch.autograd.detect_anomaly

Raises on any errors (such as NaN) in gradient calculations

@josevalim josevalim added the kind:feature New feature or request label Jan 23, 2021
@josevalim josevalim added the area:nx Applies to nx label Feb 12, 2021
@ondrej-tucek
Copy link

What_ are the use cases?

In my humble opinion

  1. definitely it would be valuable from numerics aspect. I mean, there are a plenty of numerical methods and algorithms which compute something with given precision. So if I'd start to calculate some mathematical problem, lets say with float32 precision, I'd also expect that result would be a value with float32 precision or an error message, e.g. {:ok, 2.71} | {:error, "+Inf"}.

  2. IoT, thanks to Nerves project we can read a measured data from many types of sensors or can control devices (e.g. pressure valve, etc.). Each sensor's manufacturer (similar for control device) has own datasheet for given sensor where are written its limitations (e.g. raw binary range, endianness, service life,...) and howto use it. For example, according to datasheet you should convert a measured raw binary data to float32. And if Nx would have an implementation of convert function (binary to float32) as result type (i.e. {:ok, val} | {:error, msg}) then we can easily see that there is an issue with sensor, catch the error, send message to user,...

we would just have to adjust some of the functions for reading/writing scalars to binaries.

I think that's good idea. I haven't look at on the core of Nx so deeply yet, thus I don't know how complicated can be it in your case. In our, it was simple an implementation of IEEE754 for single and double conversion.

@polvalente
Copy link
Contributor

polvalente commented Jan 6, 2022

Commenting here so we can keep track of this all on the same place:

@dkuku found a match error in a Neural Net example which turned out to be related to Infinity. Minimal examples which fail like this on both f32 and f64:

iex> Nx.tensor([1.0e32]) |> Nx.power(2) |> Nx.add(Nx.tensor([1]))
** (MatchError) no match of right hand side value: <<0, 0, 128, 127>>
    (nx 0.1.0) lib/nx/binary_backend.ex:632: anonymous fn/10 in Nx.BinaryBackend.element_wise_bin_op/4
    (elixir 1.13.0) lib/enum.ex:4136: Enum.reduce_range/5
    (nx 0.1.0) lib/nx/binary_backend.ex:628: Nx.BinaryBackend.element_wise_bin_op/4
iex(2)> Nx.tensor([1.0e32], type: {:f, 64}) |> Nx.power(9) |> Nx.add(Nx.tensor([1.0e288]))
** (MatchError) no match of right hand side value: <<0, 0, 128, 127>>
    (nx 0.1.0) lib/nx/binary_backend.ex:639: anonymous fn/10 in Nx.BinaryBackend.element_wise_bin_op/4
    (elixir 1.13.0) lib/enum.ex:4136: Enum.reduce_range/5
    (nx 0.1.0) lib/nx/binary_backend.ex:628: Nx.BinaryBackend.element_wise_bin_op/4

@tiagodavi
Copy link
Contributor

It's similar the error I am facing here:
#614

@polvalente
Copy link
Contributor

Support for NaN, negative and positive infinity was added during addition of support for complex numbers. Some outstanding functions persist in #792 and #793, but I think this issue can be closed in favor of the specific ones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:nx Applies to nx kind:feature New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants