Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low bit Optimizers & FA-3 #742

Open
asahni04 opened this issue Dec 16, 2024 · 2 comments
Open

Low bit Optimizers & FA-3 #742

asahni04 opened this issue Dec 16, 2024 · 2 comments

Comments

@asahni04
Copy link

  1. hi have there been any tests with fa-3 and low bit optimizers from torchao like FP8adam for 8bit adam? i see divergence in training when resuming a FA-2 checkpoint with FA-3 or when using 8BITADAMW
@fegin
Copy link
Contributor

fegin commented Dec 16, 2024

cc., @weifengpy

@weifengpy
Copy link
Contributor

Hey @asahni04, do you happen to have some breakdown?

  • baseline: load FA-2 checkpoint with FA-2 model, adamw
  • switch to FA-3
  • switch to 8-bit adamw

It helps clarify if it's FA-3 (model state dict) or 8-bit adamw (optim state dict)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants