Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLama 2 Flash Attention Patch Not Working For 70B #32

Open
mallorbc opened this issue Sep 7, 2023 · 6 comments
Open

LLama 2 Flash Attention Patch Not Working For 70B #32

mallorbc opened this issue Sep 7, 2023 · 6 comments

Comments

@mallorbc
Copy link

mallorbc commented Sep 7, 2023

The flash attention patch seems to be working for LLama 7B and LLama 13B(though I need to confirm more than just a successful backward pass). However, for whatever reason, for LLama 70B, I am getting an error like the following:

File "/datadrive/Finetune_LLMs/finetuning_repo/llama_patch.py", line 47, in forward
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 190, 64, 128]' is invalid for input of size 194560

@mallorbc
Copy link
Author

mallorbc commented Sep 7, 2023

For anyone else wondering why:
#30

@mallorbc mallorbc closed this as completed Sep 7, 2023
@philschmid
Copy link
Owner

Hey @mallorbc,

I needed to revert #30 since it broke the training for 7B and 13B i haven't had the chance to look at it again.

@mallorbc
Copy link
Author

mallorbc commented Sep 8, 2023

@philschmid Thanks for getting back to me and thanks for your work and blog post!

I eventually saw that.

I was able to use your work to get finetuning working for 7B, 13B, and 70B. Instead of having one forward pass, having two based on what model is being used is good enough for me!

Used it here:
https://github.com/mallorbc/Finetune_LLMs/blob/main/finetuning_repo/llama_patch.py

Thanks again!

@philschmid
Copy link
Owner

@mallorbc an nice! I try to make it compatible with both soonish. But we are also working on adding native support in transformers so in a few weeks not longer need to patch those.

@mallorbc
Copy link
Author

mallorbc commented Sep 8, 2023

@philschmid The old 70B patch, while it supports a backward and forward pass, still has issues.

When I try to generate text with the model after training with Qlora, I don't get the expected results. When I try to use text-generation-inference, I get shape issues as well.

These issues do not exist for the 7B and 13B model, which works great!

Just thought I would let you know. Thanks!

@mallorbc mallorbc reopened this Sep 8, 2023
@mallorbc
Copy link
Author

mallorbc commented Sep 9, 2023

This repo here has a working implementation for all models, that being 7B, 13B, and 70B.
It's licensed as GPL 3.0, but for my repo which is APGL, that is fine.

https://github.com/oKatanaaa/llama-flash-attention-patch/tree/master

He gets part of the solution from here, which is licensed as Apache 2.0:
https://github.com/LAION-AI/Open-Assistant/blob/04fa9a24b2a58c8885b8aa6a2eb02b18de6b4961/model/model_training/models/patching_llama.py

May be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants