Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about code #13

Open
sweetdream33 opened this issue Mar 26, 2019 · 3 comments
Open

Questions about code #13

sweetdream33 opened this issue Mar 26, 2019 · 3 comments

Comments

@sweetdream33
Copy link

sweetdream33 commented Mar 26, 2019

Hi, thank you for publishing the code.
(1) I am trying to run your code but I found a strange point. The ECE value varies greatly depending on the type and parameter of the optimizer.
I have experimented with LBFGS and Adam adjusting learning rate, max_iter.

So I added this line 'optimizer.zero_grad()' to the original code.
Adding this line will fix the ECE value to some extent.
Is it right to add this?

           loss = nll_criterion(self.temperature_scale(val_logits), val_labels)
            optimizer.zero_grad()
            loss.backward()

(2) Also, I added this line 'model.eval()' before 'logits_list = []'.
If I add this, ECE values ​​are better, do not I need to add it?

I'll wait for an answer.

@kirk86
Copy link

kirk86 commented Apr 2, 2019

@sweetdream33
(1) Indeed I've noticed the same, ECE values vary depending on optimizer and hyperparam choice such as max_iter.

Is it right to add this?

I presume yes, although I'm not expert in pytorch but all the examples I've seen in the training loop before everything else they zero the grads of the model and optimizer.

model.zero_grad()
optimizer.zero_grad()
nll_criterion(...

@dreamflasher
Copy link

My impression is that the changes you are seeing are mostly from the np and pytorch randomness. See here: #16

When I fix the seeds, neither model.eval() nor model.zero_grad() have any impact anymore. optimizer.zero_grad() still makes a difference, though it does not constantly improve.

@yshvrdhn
Copy link

yshvrdhn commented Feb 6, 2020

https://stats.stackexchange.com/questions/284712/how-does-the-l-bfgs-work/285106 . I think since we are using L-BFGS we should not be calling optimizer.zero_grad() after each minibatch and let it accumulated for several minibatch and than do the update and set the gradients to zero again. that might help improve the temperature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants