[Bugfix] Fix M-RoPE position calculation when chunked prefill is enabled #10388
+135
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix
MRotaryEmbedding
'sget_input_positions
when chunked prefill is enabled.It only slice at the left-hand side of generated
llm_positions
currently (forgetting the right-hand side). This PR add right-hand slice position in it to support chunked prefill.vllm/vllm/model_executor/layers/rotary_embedding.py
Lines 923 to 928 in 1d75472
Explanation
To make it more clear, here is an example with following configuration:
assume a
len=40
promptenable_chunked_prefill=True
, andmax_num_batched_tokens=32
add some log in
model_runner.py::ModelInputForGPUBuilder::build
nearvllm/vllm/worker/model_runner.py
Lines 952 to 957 in 1d75472
Result:
Related error log:
the error occurs near:
vllm/vllm/model_executor/layers/rotary_embedding.py
Lines 807 to 825 in 1d75472
About the test I added
Qwen2-VL's M-RoPE works only when there are some multi-modal inputs,
so an image is included in the inputs
however, Qwen2-VL currently won't work properly when chunked prefill is enabled and there are some multi-modal inputs (it assumes the input is never chunked)
vllm/vllm/model_executor/models/qwen2_vl.py
Lines 1229 to 1238 in 1d75472
here use a hacky way: provide a zero-length image to make it happy
and finally we achieved these requirements to allow our test continue