Skip to content

Pull requests: HabanaAI/vllm-fork

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add PP support for Prefill/Decode Disaggregation
#1364 opened Jun 4, 2025 by lvliang-intel Loading…
changes to enable LMCache v1 baseline
#1361 opened Jun 3, 2025 by hsubramony Loading…
Set simple_compile_backend for HpuPlatform
#1359 opened Jun 3, 2025 by Kacper-Pietkun Loading…
[SW-229465] Search for multiple RoPE modules
#1355 opened Jun 3, 2025 by RafLit Loading…
Adjust batch size to match bucket size
#1354 opened Jun 3, 2025 by xhaihao Loading…
by default disable contiguous_pa on Gaudi2.
#1345 opened May 30, 2025 by ccrhx4 Loading…
Revise DeepSeek-R1 README and update start scripts
#1339 opened May 29, 2025 by taotod Loading…
fix requirements/hpu.txt for hpu extension
#1336 opened May 29, 2025 by ranzhejiang Loading…
Fix prefill warm up issue
#1335 opened May 29, 2025 by yeonsily Draft
Fix vllm crash when running with lm-eval
#1321 opened May 27, 2025 by ccrhx4 Loading…
Add Flag to speed up Qwen3 fp8 warmup issue
#1319 opened May 27, 2025 by Yanli2190 Loading…
[Torch compile] Torch compilation on Sampler
#1314 opened May 26, 2025 by jczaja Loading…
Enabled MoE for both BF16 and INC based FP8 on Gaudi
#1309 opened May 23, 2025 by gyou2021 Loading…
parallel compile for fast warm up
#1304 opened May 22, 2025 by inkcherry Loading…
optimize transfer time. use mooncake put/get_unsafe.
#1297 opened May 22, 2025 by jikunshang Loading…
ProTip! Updated in the last three days: updated:>2025-06-01.