Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

might be a waste of resources #31745

Closed
SDaoer opened this issue Jul 2, 2024 · 4 comments
Closed

might be a waste of resources #31745

SDaoer opened this issue Jul 2, 2024 · 4 comments

Comments

@SDaoer
Copy link

SDaoer commented Jul 2, 2024

        while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device):
            # prepare model inputs
            model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)

            # forward pass to get next token
            outputs = self(
                **model_inputs,
                return_dict=True,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
            )

            if synced_gpus and this_peer_finished:
                continue  # don't waste resources running the code we don't need

            ...

Why is this condition checked after the outputs is generated? Can this be considered a form of resource wastage? Could this part be moved to the beginning of the while loop?

if synced_gpus and this_peer_finished:
    continue  # don't waste resources running the code we don't need

The code comes from transformers/generation/utils.py: GenerationMixin._sample

@amyeroberts
Copy link
Collaborator

cc @gante @zucchini-nlp

Copy link

github-actions bot commented Aug 2, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@gante
Copy link
Member

gante commented Aug 2, 2024

Hi @SDaoer 👋 Thank you for opening this issue 🤗

TL;DR without it the code will hang in specific settings.

The answer is documented in the code:

# Under synced_gpus the `forward` call must continue until all gpus complete their sequence.

You can complement this comment with the meaning of synced_gpus:

synced_gpus (`bool`, *optional*):
                Whether to continue running the while loop until max_length. Unless overridden this flag will be set to
                `True` under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished
                generating before other GPUs. Otherwise it'll be set to `False`.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants