add ScaleToHwAligned for fp8 vllm model loading #118

changwangss · 2025-03-21T07:58:48Z

https://jira.habana-labs.com/browse/SW-207506

michalkuligowski

vllm-fork CI fails with those changes, please check tests in HabanaAI/vllm-fork#941

vllm_hpu_extension/ops.py

Signed-off-by: changwangss <[email protected]>

dudilester

LGTM

https://jira.habana-labs.com/browse/SW-207506 the scales provided by neuralmagic fp8 model method are maxabs, it need do hw scale align to adapt hpu platform to get better accuracy and performance. I add the class `ConvertScaleHwAlign` in vllm-hpu-extension(HabanaAI/vllm-hpu-extension#118) and call it in vllm-fork. the class also include the device check and make factor first if the device is G2. it is used for loading the models in this link. https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127 --------- Signed-off-by: changwangss <[email protected]>

This reverts commit 145c63d.

* add ScaleToHwAligned for fp8 vllm model loading Signed-off-by: changwangss <[email protected]> * remove import Signed-off-by: changwangss <[email protected]> * improve structure Signed-off-by: changwangss <[email protected]> --------- Signed-off-by: changwangss <[email protected]>

changwangss marked this pull request as ready for review March 21, 2025 08:19

changwangss requested review from kzawora-intel, madamczyk-intel, michalkuligowski, mgawarkiewicz, tzielinski-habana and afierka-intel as code owners March 21, 2025 08:19

changwangss mentioned this pull request Mar 21, 2025

add ScaleToHwAligned for loading fp8 vllm model HabanaAI/vllm-fork#941

Merged

kiazada requested review from linoybu and nirda7 March 23, 2025 08:46

michalkuligowski requested changes Mar 24, 2025

View reviewed changes

dudilester reviewed Apr 1, 2025

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

changwangss added 2 commits April 7, 2025 08:23

add ScaleToHwAligned for fp8 vllm model loading

a42c8cf

Signed-off-by: changwangss <[email protected]>

remove import

8f26a9c

Signed-off-by: changwangss <[email protected]>

changwangss force-pushed the wangchang/hw_scale branch from 4d7a4fb to 8f26a9c Compare April 7, 2025 08:29

improve structure

fb114e3

Signed-off-by: changwangss <[email protected]>

changwangss requested a review from dudilester April 7, 2025 11:14

dudilester approved these changes Apr 7, 2025

View reviewed changes

changwangss requested a review from michalkuligowski April 7, 2025 13:03

changwangss merged commit 145c63d into HabanaAI:main Apr 7, 2025

michalkuligowski added a commit that referenced this pull request Apr 15, 2025

Revert "add ScaleToHwAligned for fp8 vllm model loading (#118)"

0b6ace7

This reverts commit 145c63d.

michalkuligowski mentioned this pull request Apr 15, 2025

Revert "add ScaleToHwAligned for fp8 vllm model loading" #149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add ScaleToHwAligned for fp8 vllm model loading #118

add ScaleToHwAligned for fp8 vllm model loading #118

Uh oh!

changwangss commented Mar 21, 2025 •

edited

Loading

Uh oh!

michalkuligowski left a comment

Uh oh!

Uh oh!

dudilester left a comment

Uh oh!

Uh oh!

add ScaleToHwAligned for fp8 vllm model loading #118

add ScaleToHwAligned for fp8 vllm model loading #118

Uh oh!

Conversation

changwangss commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalkuligowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dudilester left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

changwangss commented Mar 21, 2025 •

edited

Loading