New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Low bit Optimizers & FA-3 #742

Open

asahni04 opened this issue Dec 16, 2024 · 2 comments

asahni04 commented Dec 16, 2024

hi have there been any tests with fa-3 and low bit optimizers from torchao like FP8adam for 8bit adam? i see divergence in training when resuming a FA-2 checkpoint with FA-3 or when using 8BITADAMW

Contributor

fegin commented Dec 16, 2024

cc., @weifengpy

Contributor

weifengpy commented Dec 16, 2024

Hey @asahni04, do you happen to have some breakdown?

baseline: load FA-2 checkpoint with FA-2 model, adamw
switch to FA-3
switch to 8-bit adamw

It helps clarify if it's FA-3 (model state dict) or 8-bit adamw (optim state dict)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment