-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about code #13
Comments
@sweetdream33
I presume yes, although I'm not expert in pytorch but all the examples I've seen in the training loop before everything else they zero the grads of the model and optimizer.
|
My impression is that the changes you are seeing are mostly from the np and pytorch randomness. See here: #16 When I fix the seeds, neither |
https://stats.stackexchange.com/questions/284712/how-does-the-l-bfgs-work/285106 . I think since we are using L-BFGS we should not be calling optimizer.zero_grad() after each minibatch and let it accumulated for several minibatch and than do the update and set the gradients to zero again. that might help improve the temperature. |
Hi, thank you for publishing the code.
(1) I am trying to run your code but I found a strange point. The ECE value varies greatly depending on the type and parameter of the optimizer.
I have experimented with LBFGS and Adam adjusting learning rate, max_iter.
So I added this line 'optimizer.zero_grad()' to the original code.
Adding this line will fix the ECE value to some extent.
Is it right to add this?
(2) Also, I added this line 'model.eval()' before 'logits_list = []'.
If I add this, ECE values are better, do not I need to add it?
I'll wait for an answer.
The text was updated successfully, but these errors were encountered: