You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think most implementations would require gradient clipping.
For example, Step (exp-decay with step size) with gradient clipping setting a lower bound of the learning rate.
These were common in previous versions of Flux.
So I think it would be very useful if such functionality is provided by default with keyword arguments.
If it is already implemented, I don't think the current documentation describes it well as I couldn't find it in the docs.
Possible Implementation
No response
The text was updated successfully, but these errors were encountered:
Motivation and description
I think most implementations would require gradient clipping.
For example,
Step
(exp-decay with step size) with gradient clipping setting a lower bound of the learning rate.These were common in previous versions of Flux.
So I think it would be very useful if such functionality is provided by default with keyword arguments.
If it is already implemented, I don't think the current documentation describes it well as I couldn't find it in the docs.
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: