Skip to content

Comments

add ScaleToHwAligned for fp8 vllm model loading#118

Merged
changwangss merged 3 commits intoHabanaAI:mainfrom
changwangss:wangchang/hw_scale
Apr 7, 2025
Merged

add ScaleToHwAligned for fp8 vllm model loading#118
changwangss merged 3 commits intoHabanaAI:mainfrom
changwangss:wangchang/hw_scale

Conversation

@changwangss
Copy link
Contributor

@changwangss changwangss commented Mar 21, 2025

Copy link
Contributor

@michalkuligowski michalkuligowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm-fork CI fails with those changes, please check tests in HabanaAI/vllm-fork#941

Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: changwangss <changwang@habana.ai>
@changwangss changwangss requested a review from dudilester April 7, 2025 11:14
Copy link
Contributor

@dudilester dudilester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@changwangss changwangss merged commit 145c63d into HabanaAI:main Apr 7, 2025
michalkuligowski pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 8, 2025
https://jira.habana-labs.com/browse/SW-207506 the scales provided by
neuralmagic fp8 model method are maxabs, it need do hw scale align to
adapt hpu platform to get better accuracy and performance. I add the
class `ConvertScaleHwAlign` in
vllm-hpu-extension(HabanaAI/vllm-hpu-extension#118)
and call it in vllm-fork. the class also include the device check and
make factor first if the device is G2.
it is used for loading the models in this link.
https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127

---------

Signed-off-by: changwangss <changwang@habana.ai>
michalkuligowski added a commit that referenced this pull request Apr 15, 2025
Kacper-Pietkun pushed a commit that referenced this pull request Apr 15, 2025
* add ScaleToHwAligned for fp8 vllm model loading

Signed-off-by: changwangss <changwang@habana.ai>

* remove import

Signed-off-by: changwangss <changwang@habana.ai>

* improve structure

Signed-off-by: changwangss <changwang@habana.ai>

---------

Signed-off-by: changwangss <changwang@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants