You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.
Information about environment
OS: Ubuntu
Python: Python3.10
GPUs: 2x NV 4090
Log output
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/peft/peft_model.py", line 1644, in forward
return self.base_model(
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
return self.model.forward(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1183, in forward
loss = self.loss_function(logits, labels, self.vocab_size, **loss_kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/loss/loss_utils.py", line 46, in ForCausalLMLoss
loss = fixed_cross_entropy(shift_logits, shift_labels, num_items_in_batch, ignore_index, **kwargs)
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/loss/loss_utils.py", line 28, in fixed_cross_entropy
loss = loss / num_items_in_batch
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Description
Steps to reproduce
This happens to Qwen2.5-7B-Instruct
The problem can be reproduced with the following steps:
just peft training
Expected results
The results are expected to be training
Attempts to fix
Anything else helpful for investigation
downgrade transformers to 4.45.0 will work.
The text was updated successfully, but these errors were encountered:
looks like an issue with tranformers after the loss functions are reworked in 4.46.
for a hot fix, could you try edit this line
File "/root/anaconda3/envs/newpy10/lib/python3.10/site-packages/transformers/loss/loss_utils.py", line 28, in fixed_cross_entropy
loss = loss / num_items_in_batch
to
loss = loss / torch.tensor(num_items_in_batch, device=loss.device)
or stay at transformers<4.46.0 until a proper fix is released.
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
Model Series
Qwen2.5
What are the models used?
Qwen2.5-7B
What is the scenario where the problem happened?
transformers
Is this a known issue?
Information about environment
OS: Ubuntu
Python: Python3.10
GPUs: 2x NV 4090
Log output
Description
Steps to reproduce
This happens to Qwen2.5-7B-Instruct
The problem can be reproduced with the following steps:
Expected results
The results are expected to be training
Attempts to fix
Anything else helpful for investigation
downgrade transformers to 4.45.0 will work.
The text was updated successfully, but these errors were encountered: