Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run the Model generated from the example script #251

Open
hz-nm opened this issue Nov 27, 2024 · 1 comment
Open

Cannot run the Model generated from the example script #251

hz-nm opened this issue Nov 27, 2024 · 1 comment

Comments

@hz-nm
Copy link

hz-nm commented Nov 27, 2024

I was testing out the library for the model on a single GPU for training.
Used the following command to run the training,

CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=1 run_train.py --config-file examples/config_tiny_llama.yaml

Made some changes in the config_tiny_llama.yaml file which include,

parallelism:
  dp: 1 # 2
  expert_parallel_size: 1
  pp: 1 # 2
  pp_engine: 1f1b
  tp: 1 # 2
  tp_linear_async_communication: true
  tp_mode: REDUCE_SCATTER

The training ran smoothly and the checkpoints were generated, however when I try to run the model using,

torchrun --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --tp 1 --pp 1

I get the following error,

[rank0]:   File "/mnt/d/nanotron-pretrain/nanotron/src/nanotron/models/llama.py", line 529, in forward
[rank0]:     (query_unpad, indices_q, cu_seqlens_q, max_seqlen_q) = bert_padding.unpad_input(
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 4)

Any help to resolve this issue will be greatly appreciated. Thanks.

@hz-nm
Copy link
Author

hz-nm commented Nov 28, 2024

So I changed the lines a bit in the models/llama.py, i.e. added an extra parameter

(query_unpad, indices_q, cu_seqlens_q, max_seqlen_q, _) = bert_padding.unpad_input(
                    query_states,
                    sequence_mask,
                )
(key_unpad, indices_k, cu_seqlens_k, max_seqlen_k, _) = bert_padding.unpad_input(
                    key_states, sequence_mask
                )
(value_unpad, _, _, _, _) = bert_padding.unpad_input(value_states, sequence_mask)

Here are the generations,

11/28/2024 12:45:40 [INFO|DP=0|PP=0|TP=0]: input: [CLS] the [UNK] [UNK] [UNK] is [SEP]
11/28/2024 12:45:40 [INFO|DP=0|PP=0|TP=0]: generation: [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant