-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError in Prefix #870
Comments
Could you please provide more information: What model are you using, what data, how do you train the model, what is the full stacktrace? Otherwise, it's hard to help you. |
@BenjaminBossan we had two very similar errors here and here here |
@BenjaminBossan Sorry I can't post all the code. Because I used LLaMA-Effcient-Tuning and added the prefix tuning method on it, it would be very troublesome to extract this part of the code, and the amount of code would be huge if it was not extracted. if peft_config.peft_type == PeftType.PREFIX_TUNING:
past_key_values = self.get_prompt(batch_size)
return self.base_model(input_ids=input_ids, past_key_values=past_key_values, **kwargs)
else:
if inputs_embeds is None:
inputs_embeds = self.word_embeddings(input_ids)
# concat prompt labels
if labels is not None:
prefix_labels = torch.full((batch_size, peft_config.num_virtual_tokens), -100).to(self.device)
kwargs["labels"] = torch.cat((prefix_labels, labels), dim=1)
prompts = self.get_prompt(batch_size=batch_size)
prompts = prompts.to(inputs_embeds.dtype)
inputs_embeds = torch.cat((prompts, inputs_embeds), dim=1)
return self.base_model(inputs_embeds=inputs_embeds, **kwargs) This is the main difference between prefix and other peft methods, he passes to the model past_key_values instead of as input, so I checked the forward method of BloomModel in transformer, as follows(in transformers/models/bloom/modeling_bloom.py, from line 761 to 791): for i, (block, layer_past) in enumerate(zip(self.h, past_key_values)):
if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_states,)
if self.gradient_checkpointing and self.training:
def create_custom_forward(module):
def custom_forward(*inputs):
# None for past_key_value
return module(*inputs, use_cache=use_cache, output_attentions=output_attentions)
return custom_forward
outputs = torch.utils.checkpoint.checkpoint(
create_custom_forward(block),
hidden_states,
alibi,
causal_mask,
layer_past,
head_mask[i],
) I think when backward, It only saves the gradient to hidden_states, not layer_past in past_key_values, leading to this question. |
I don't have much experience with prefix tuning, maybe @pacman100 has an idea here. |
@liuao743 我也是在给LLaMA-Effcient-Tuning添加perfix tuning方法的时候遇到了这个问题,请问现在有解决方法了嘛,比如更新或回退peft、transformers版本? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
this error need to modify transformers/src/transformers/models/llama/modeling_llama.py file,
set past_key_value and use_cache=None,it's ok |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
System Info
when I use prefix tuning on llama, it occurs :
The peft version is 0.4.0
I have tried all the other tuning methods supported by PEFT, This problem did not occur.
Who can help?
No response
Information
Tasks
examples
folderReproduction
my code is as follows:
The Specific values are:
Expected behavior
sove this problem
The text was updated successfully, but these errors were encountered: