[whisper] compile compatibility with long-form decoding #31772

sanchit-gandhi · 2024-07-03T12:48:16Z

What does this PR do?

PR #31166 introduced static k/v cache for Whisper short-form decoding. It was noted in this PR that the current generation logic is not compatible with sequential long-form generation, since the batch size is reduced dynamically in two places:

In the outer loop over time position, we remove audio samples that have already finished generation
In the inner loop over temperatures, we remove inputs that don't need fallback at the next temperature increment

For torch.compile compatibility with our current cache design, we require the batch size to be fixed. Otherwise, for every batch size we create a new cache object in .generate, which changes the data pr of the k/v cache tensors, causing a re-compile.

As things currently stand, we get re-compiles due both the outer and inner loop dynamically changing the batch size. This PR introduces a simple fix: pad the inputs to the max batch size before calling the model, and remove any padded outputs before post-processing.

The alternative would be to change the batch_idx_map logic, such that we always keep the full sequence of input features, but only update the sequence generations for the elements of interest. Having tried this quickly as a PoC, the changes are more involved than those proposed in this PR and quickly clutters the dynamic generation logic, which we're retaining for faster eager mode.

HuggingFaceDocBuilderDev · 2024-07-03T13:08:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kamilakesbi

Hi @sanchit-gandhi,

Those changes look good to me!

Padding the inputs to the maximum batch size before calling the model and then removing the padded outputs before post-processing is a nice way to fix the problem.

This solution should also be compatible with PR #30984 where we want to unify short and long form generation in Whisper.

kamilakesbi · 2024-07-05T09:39:41Z

src/transformers/models/whisper/generation_whisper.py

@@ -807,6 +834,10 @@ def generate_with_fallback(
                generation_config=generation_config,
            )

+            if cur_bsz < batch_size:
+                seek_sequences = seek_sequences[:cur_bsz]
+                seek_outputs = seek_outputs[:cur_bsz]


Nice trick! :)

sanchit-gandhi added 2 commits July 3, 2024 13:38

[whisper] compile compatibility with long-form decoding

888c3ab

clarify comment

bcb376e

sanchit-gandhi requested review from kamilakesbi and gante July 3, 2024 12:49

kamilakesbi reviewed Jul 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[whisper] compile compatibility with long-form decoding #31772

[whisper] compile compatibility with long-form decoding #31772

sanchit-gandhi commented Jul 3, 2024

HuggingFaceDocBuilderDev commented Jul 3, 2024

kamilakesbi left a comment

kamilakesbi Jul 5, 2024

[whisper] compile compatibility with long-form decoding #31772

Are you sure you want to change the base?

[whisper] compile compatibility with long-form decoding #31772

Conversation

sanchit-gandhi commented Jul 3, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Jul 3, 2024

kamilakesbi left a comment

Choose a reason for hiding this comment

kamilakesbi Jul 5, 2024

Choose a reason for hiding this comment