-
Notifications
You must be signed in to change notification settings - Fork 18
[do not merge][CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yannick Schnider <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
a7e7ae9
to
49d92f5
Compare
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
great news: This runs on Spyre 🎉 I just ran
cc: @tdoublep @JRosenkranz @joerunde @nikolaospapandreou @sducouedic |
bot:test |
bot:test |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
bot:test |
6/7 tests passed on the Spyre card! looks like the failure is a known issue unrelated to this PR. 🥳 |
solves #255