Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

该框架现在是否不支持glm-4v-9b 用--deepspeed zero3-offload 方式微调? #1333

Closed
jhrsya opened this issue Jul 9, 2024 · 1 comment

Comments

@jhrsya
Copy link

jhrsya commented Jul 9, 2024

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

命令:

NPROC_PER_NODE=4 \
swift sft \
    --model_type glm4v-9b-chat \
    --dataset data.json \
    --model_id_or_path /model_weights/glm-4v-9b \
    --output_dir output \
    --ddp_find_unused_parameters true \
    --add_output_dir_suffix false \
    --batch_size 1 \
    --deepspeed zero3-offload

bug:

 model = automodel_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 113, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 556, in from_pretrained
    model, tokenizer = get_model_tokenizer_from_repo(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
  File "/data/vjuicefs_nlp/11169265/vlm/Inference/swift/swift/llm/utils/model.py", line 915, in get_model_tokenizer_from_repo
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 76, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3375, in from_pretrained
    model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
  File "/data/vjuicefs_nlp/11169265/vlm/Inference/swift/swift/llm/utils/model.py", line 1506, in get_model_tokenizer_chatglm
    model = cls(config, *model_args, **model_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 459, in wrapper
    f(module, *args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 931, in __init__
    self.transformer = ChatGLMModel(config, empty_init=empty_init, device=device)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 459, in wrapper
    f(module, *args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 772, in __init__
    self.embedding = init_method(Embedding, config, **init_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/utils/init.py", line 52, in skip_init
    return module_cls(*args, **kwargs).to_empty(device=final_device)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 459, in wrapper
    f(module, *args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 735, in __init__
    self.word_embeddings = nn.Embedding(
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 466, in wrapper
    self._post_init_method(module)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 995, in _post_init_method
    param.data = param.data.to(self.local_device)
NotImplementedError: Cannot copy out of meta tensor; no data!
@tastelikefeet
Copy link
Collaborator

似乎是显存溢出

@jhrsya jhrsya closed this as completed Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants