-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question FBGEMM_GPU] Adam optmizer not optimized #2824
Comments
Another issue is when I specify the ouput datatype as bf16, it hit me with not implemented error. |
Hi @JacoCheung
You can move Adam off the experimental optimizer list by setting
False . This should make it more performant
We have enabled BF16 output for every optimizer. Could you share an error log? |
Regarding the fp16 output dtype, fbgemm does not have a scaler for backward/update. Is this intended? |
Which scalar are you referring to? |
The scaler used in mxied precision training. |
Could you please share the link to the scalar that you're referring to? Thanks |
Sorry for my confusion. Let me clarify a little bit. The scalar I refer to is a generic concept in mixed-precision training esp in fp16 training. In fp16 training schema, the loss is usually scaled, and so the dgrad is scaled in the bwd. There should be a unscaling process for wgrad(or dgrad). However, fbgemm_gpu fuses update with bwd / dgrad (TBE does not have explict wgrad ). So I expect the forward() function of TBE operator to accept a scaling factor, and do the dgrad/wgrad unscaling at backward stage. |
Hi team, I'm using Adam optimizer for my model. But there is a warning regarding performance. (Can It be resolved? Or do you have any quantitative number for the perf degradation?)
I also noted that there was a discussion about the optimizer.
It seemed that adam was not considered for optimizations. I'd like to know what's the plan for Adam for today. Thanks!
The text was updated successfully, but these errors were encountered: