You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, it is said alpha should be 0.99 at the beginning (when global_step is small) and should be 0.999 at the end (when global_step is large), however, in the code:
alpha = min(1 - 1 / (global_step + 1), alpha)
following this, alpha is 0 when global_step is small, and is alpha (this is set as 0.99 from parameters) when global_step is >99. The code seems different what the paper presented. The paper indicates a code of
alpha = max(1 - 1 / (global_step + 1), alpha)
does anyone find issues here?
The text was updated successfully, but these errors were encountered:
I have the same confusion. What's more, alpha is a function with global_step, so when batch_size change, the step of every Epoch is also change. But in the paper, it said that alpha was relative with ramp up epoch.
In the paper, it is said alpha should be 0.99 at the beginning (when global_step is small) and should be 0.999 at the end (when global_step is large), however, in the code:
alpha = min(1 - 1 / (global_step + 1), alpha)
following this, alpha is 0 when global_step is small, and is alpha (this is set as 0.99 from parameters) when global_step is >99. The code seems different what the paper presented. The paper indicates a code of
alpha = max(1 - 1 / (global_step + 1), alpha)
does anyone find issues here?
The text was updated successfully, but these errors were encountered: