-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLama 2 Flash Attention Patch Not Working For 70B #32
Comments
For anyone else wondering why: |
@philschmid Thanks for getting back to me and thanks for your work and blog post! I eventually saw that. I was able to use your work to get finetuning working for 7B, 13B, and 70B. Instead of having one forward pass, having two based on what model is being used is good enough for me! Used it here: Thanks again! |
@mallorbc an nice! I try to make it compatible with both soonish. But we are also working on adding native support in |
@philschmid The old 70B patch, while it supports a backward and forward pass, still has issues. When I try to generate text with the model after training with Qlora, I don't get the expected results. When I try to use text-generation-inference, I get shape issues as well. These issues do not exist for the 7B and 13B model, which works great! Just thought I would let you know. Thanks! |
This repo here has a working implementation for all models, that being 7B, 13B, and 70B. https://github.com/oKatanaaa/llama-flash-attention-patch/tree/master He gets part of the solution from here, which is licensed as Apache 2.0: May be useful. |
The flash attention patch seems to be working for LLama 7B and LLama 13B(though I need to confirm more than just a successful backward pass). However, for whatever reason, for LLama 70B, I am getting an error like the following:
File "/datadrive/Finetune_LLMs/finetuning_repo/llama_patch.py", line 47, in forward
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 190, 64, 128]' is invalid for input of size 194560
The text was updated successfully, but these errors were encountered: