[do not merge][CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #262

yannicks1 · 2025-06-24T13:31:44Z

solves #255

Signed-off-by: Yannick Schnider <[email protected]>

github-actions · 2025-06-24T13:32:02Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-06-30T16:59:09Z

great news: This runs on Spyre 🎉

I just ran cb_spyre_inference.py which (with the parameters on this branch) exploits all functionality:

prefill seq 1 of size 128 (left pads)
prefill seq 2 of size 64 (left pads) (see [CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #255)
decode of batch 2
strips fully padded blocks (see [CB] strip repeated left padding on batch level #131 )
prefill seq 3 by left padding to 66 (align tkv) and right padding to 128 (pad to block boundary)
decode of batch 2
decode of batch 1 (see [CB] add min batch size of 2 in decode #182)

cc: @tdoublep @JRosenkranz @joerunde @nikolaospapandreou @sducouedic

yannicks1 · 2025-07-01T14:47:49Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

yannicks1 · 2025-07-04T15:01:11Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-17T16:06:55Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

yannicks1 · 2025-07-17T16:30:29Z

6/7 tests passed on the Spyre card! looks like the failure is a known issue unrelated to this PR. 🥳

first implementation of optimization

7df971e

Signed-off-by: Yannick Schnider <[email protected]>

fix adding new blocks

5e1d468

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 mentioned this pull request Jun 23, 2025

[CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #255

Open

Merge branch 'main' into ysc-homog-tkv-opt-joshua

c8a33de

yannicks1 self-assigned this Jun 26, 2025

make mask contiguous

49d92f5

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 force-pushed the ysc-homog-tkv-opt-joshua branch 2 times, most recently from a7e7ae9 to 49d92f5 Compare June 27, 2025 22:40

yannicks1 and others added 3 commits June 30, 2025 08:02

Merge branch 'main' into ysc-homog-tkv-opt-joshua

fce5d91

Signed-off-by: Yannick Schnider <[email protected]>

testing parameters

df75e41

Signed-off-by: Yannick Schnider <[email protected]>

fix fmt

226db17

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 added 2 commits July 2, 2025 08:13

Merge branch 'main' into ysc-homog-tkv-opt-joshua

cdbaa45

Merge branch 'main' into ysc-homog-tkv-opt-joshua

c0fe359

yannicks1 and others added 4 commits July 10, 2025 09:42

Merge branch 'main' into ysc-homog-tkv-opt-joshua

21da7da

Signed-off-by: Yannick Schnider <[email protected]>

Merge branch 'main' into ysc-homog-tkv-opt-joshua

0830748

Signed-off-by: Yannick Schnider <[email protected]>

Merge branch 'main' into ysc-homog-tkv-opt-joshua

4f4706b

fix bug and tests

a33d5e5

Signed-off-by: Yannick Schnider <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[do not merge][CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #262

[do not merge][CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #262

Uh oh!

yannicks1 commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

yannicks1 commented Jun 30, 2025

Uh oh!

yannicks1 commented Jul 1, 2025

Uh oh!

yannicks1 commented Jul 4, 2025

Uh oh!

yannicks1 commented Jul 17, 2025

Uh oh!

yannicks1 commented Jul 17, 2025

Uh oh!

Uh oh!

[do not merge][CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #262

Are you sure you want to change the base?

[do not merge][CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #262

Uh oh!

Conversation

yannicks1 commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

yannicks1 commented Jun 30, 2025

Uh oh!

yannicks1 commented Jul 1, 2025

Uh oh!

yannicks1 commented Jul 4, 2025

Uh oh!

yannicks1 commented Jul 17, 2025

Uh oh!

yannicks1 commented Jul 17, 2025

Uh oh!

Uh oh!