-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8 Fine Tuning Crashes #248
Comments
Nevermind, it was only crashing when I used virutal console mode. I switched to a xfce4 session and it doesn't crash anymore. I installed the stable version of TransformerEngine. Edit: I reinstalled MS-AMP and I still get this error message and then I reinstalled the stable verison of TransformerEngine and still get the error message.
Edit: I got this error message. This was with the stable version of TransformerEngine.
|
I get this error message if I set the max_len to 300 or any higher than 100 for that matter whenever I'm training to train with FP8. I'm using cuda-12.4.0-2 and the nightly cuda 12.4 pytorch builds and have MS-AMP and TransformerEngine installed.
The text was updated successfully, but these errors were encountered: