might be a waste of resources #31745

SDaoer · 2024-07-02T10:05:06Z

        while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device):
            # prepare model inputs
            model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)

            # forward pass to get next token
            outputs = self(
                **model_inputs,
                return_dict=True,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
            )

            if synced_gpus and this_peer_finished:
                continue  # don't waste resources running the code we don't need

            ...

Why is this condition checked after the outputs is generated? Can this be considered a form of resource wastage? Could this part be moved to the beginning of the while loop?

if synced_gpus and this_peer_finished:
    continue  # don't waste resources running the code we don't need

The code comes from transformers/generation/utils.py: GenerationMixin._sample

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-07-02T13:30:16Z

cc @gante @zucchini-nlp

github-actions · 2024-08-02T08:03:57Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante · 2024-08-02T08:49:13Z

Hi @SDaoer 👋 Thank you for opening this issue 🤗

TL;DR without it the code will hang in specific settings.

The answer is documented in the code:

transformers/src/transformers/generation/utils.py

Line 2198 in 083e13b

    
           # Under synced_gpus the `forward` call must continue until all gpus complete their sequence.

You can complement this comment with the meaning of synced_gpus:

synced_gpus (`bool`, *optional*):
                Whether to continue running the while loop until max_length. Unless overridden this flag will be set to
                `True` under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished
                generating before other GPUs. Otherwise it'll be set to `False`.

github-actions · 2024-08-27T08:04:49Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

might be a waste of resources #31745

might be a waste of resources #31745

SDaoer commented Jul 2, 2024

amyeroberts commented Jul 2, 2024

github-actions bot commented Aug 2, 2024

gante commented Aug 2, 2024 •

edited

Loading

github-actions bot commented Aug 27, 2024

might be a waste of resources #31745

might be a waste of resources #31745

Comments

SDaoer commented Jul 2, 2024

amyeroberts commented Jul 2, 2024

github-actions bot commented Aug 2, 2024

gante commented Aug 2, 2024 • edited Loading

github-actions bot commented Aug 27, 2024

gante commented Aug 2, 2024 •

edited

Loading