Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在使用论文中提出的Fisher-diag方式进行Hessian估计时会提示Trying to backward through the graph a second time #30

Open
ariescts opened this issue Mar 12, 2022 · 3 comments

Comments

@ariescts
Copy link

如文中所提出的Fisher-diag方式来估计Hessian矩阵,需要计算每一层pre-activation的梯度。但在实际代码运行时,save_grad_data中的cur_grad = get_grad(cali_data[i * batch_size:(i + 1) * batch_size])在执行到第二个batch的时候会报错Trying to backward through the graph a second time,第一个batch的数据并不会报错。不知道作者是否遇到过类似的情况?

@ariescts
Copy link
Author

另外,文章中提到,对H(z)的估计方式是采用的g(z)^2,但代码实现中是采用的(|g(z)|+1)^2,由于g(z)本身数值很小,所得到的近似矩阵实际与单位阵并无差别,这是否意味着fisher-diag和mse本质上是同一种reconstruction loss?

@blueardour
Copy link

阅读代码的时候,可以看到mse还是Fisher-diag是可以配置的。不过好奇使用这两个对精度的影响有多大? 论文里貌似没这两组的ablation study.

@junhyukso
Copy link

I'm also curios about effect of empirical fisher loss. anyone test it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants