HabanaAI / vllm-hpu-extension Public

Notifications You must be signed in to change notification settings
Fork 35
Star 12

Code
Issues 1
Pull requests 24
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Projects
Security
Insights

Pull requests: HabanaAI/vllm-hpu-extension

Labels 10 Milestones 0

New pull request New

24 Open 218 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Allow usage of fused_block_softmax_adjustment for Qwen with Lazy

#246 opened Jun 27, 2025 by mswiniarsk • Draft

Switch softmax mode to fast for DeepSeek-R1 inference

#244 opened Jun 27, 2025 by Wei-Lin-Intel

Loading…

enable fp32 softmax for flat_pa_mla

#243 opened Jun 27, 2025 by yangulei

Loading…

Update dependabot.yml

#242 opened Jun 26, 2025 by michalkuligowski

Loading…

Update linear.py

#239 opened Jun 25, 2025 by michalkuligowski

Loading…

Integrating block_softmax

#238 opened Jun 24, 2025 by ksmusz • Draft

Remove double generate

#229 opened Jun 18, 2025 by adobrzyn

Loading…

[DeepSeek] Add automatic Pile-10k dataset processing and extended calibration settings ready for review

#225 opened Jun 16, 2025 by yiliu30

Loading…

Exponential bucketing tweaks

#224 opened Jun 13, 2025 by madamczyk-intel

Loading…

Bucketing refactoring

#223 opened Jun 12, 2025 by adobrzyn

Loading…

[WIP] Add runtime conversion of fp8fn models to fp8fnuz

#222 opened Jun 12, 2025 by kwisniewski98

Loading…

Find bucket with bmin not divs by step

#212 opened Jun 5, 2025 by adobrzyn

Loading…

Fix max_blocks for warmup decode buckets in case of disabled CONTIGUOUS PA feature

#204 opened May 29, 2025 by iboiko-habana

Loading…

Use sets for faster filter checks. Better long context support

#203 opened May 28, 2025 by pi314ever

Loading…

Add useful internal vllm test

#200 opened May 27, 2025 by nirda7 • Draft

[SW-225565] Enable triangular softmax with merged prefill

#197 opened May 26, 2025 by kamil-kaczor • Draft

fix the issue that bmax not in bucket buffer

#191 opened May 22, 2025 by sywangyi

Loading…

Unify FusedMoe with expert parallelism

#175 opened May 14, 2025 by mengniwang95

Loading…

Optimized MoE on Gaudi

#159 opened Apr 18, 2025 by gyou2021 • Draft

[FIX] fp8 gc compile error

#110 opened Mar 4, 2025 by maktukmak • Draft

Expand capability checks

#89 opened Feb 3, 2025 by kzawora-intel • Draft

Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts()

#70 opened Jan 9, 2025 by tangleintel • Draft

[WIP] Add option to do group sum on TPC instead of MME

#64 opened Dec 20, 2024 by mswiniarsk • Draft

Remove vllm.logger.init_logger dependency

#53 opened Dec 9, 2024 by kzawora-intel • Draft

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!