-
Notifications
You must be signed in to change notification settings - Fork 35
Pull requests: HabanaAI/vllm-hpu-extension
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Allow usage of fused_block_softmax_adjustment for Qwen with Lazy
#246
opened Jun 27, 2025 by
mswiniarsk
•
Draft
Switch softmax mode to fast for DeepSeek-R1 inference
#244
opened Jun 27, 2025 by
Wei-Lin-Intel
Loading…
[DeepSeek] Add automatic Pile-10k dataset processing and extended calibration settings
ready for review
#225
opened Jun 16, 2025 by
yiliu30
Loading…
[WIP] Add runtime conversion of fp8fn models to fp8fnuz
#222
opened Jun 12, 2025 by
kwisniewski98
Loading…
Fix max_blocks for warmup decode buckets in case of disabled CONTIGUOUS PA feature
#204
opened May 29, 2025 by
iboiko-habana
Loading…
Use sets for faster filter checks. Better long context support
#203
opened May 28, 2025 by
pi314ever
Loading…
[SW-225565] Enable triangular softmax with merged prefill
#197
opened May 26, 2025 by
kamil-kaczor
•
Draft
Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts()
#70
opened Jan 9, 2025 by
tangleintel
•
Draft
[WIP] Add option to do group sum on TPC instead of MME
#64
opened Dec 20, 2024 by
mswiniarsk
•
Draft
ProTip!
Follow long discussions with comments:>50.