Skip to content

add ScaleToHwAligned for fp8 vllm model loading #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 7, 2025

Conversation

changwangss
Copy link
Contributor

@changwangss changwangss commented Mar 21, 2025

Copy link
Contributor

@michalkuligowski michalkuligowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm-fork CI fails with those changes, please check tests in HabanaAI/vllm-fork#941

Signed-off-by: changwangss <[email protected]>
@changwangss changwangss requested a review from dudilester April 7, 2025 11:14
Copy link
Contributor

@dudilester dudilester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@changwangss changwangss merged commit 145c63d into HabanaAI:main Apr 7, 2025
michalkuligowski pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 8, 2025
https://jira.habana-labs.com/browse/SW-207506 the scales provided by
neuralmagic fp8 model method are maxabs, it need do hw scale align to
adapt hpu platform to get better accuracy and performance. I add the
class `ConvertScaleHwAlign` in
vllm-hpu-extension(HabanaAI/vllm-hpu-extension#118)
and call it in vllm-fork. the class also include the device check and
make factor first if the device is G2.
it is used for loading the models in this link.
https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127

---------

Signed-off-by: changwangss <[email protected]>
imangohari1 pushed a commit to imangohari1/vllm-fork that referenced this pull request Apr 8, 2025
https://jira.habana-labs.com/browse/SW-207506 the scales provided by
neuralmagic fp8 model method are maxabs, it need do hw scale align to
adapt hpu platform to get better accuracy and performance. I add the
class `ConvertScaleHwAlign` in
vllm-hpu-extension(HabanaAI/vllm-hpu-extension#118)
and call it in vllm-fork. the class also include the device check and
make factor first if the device is G2.
it is used for loading the models in this link.
https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127

---------

Signed-off-by: changwangss <[email protected]>
tvoas pushed a commit to tvoas/vllm-fork that referenced this pull request Apr 9, 2025
https://jira.habana-labs.com/browse/SW-207506 the scales provided by
neuralmagic fp8 model method are maxabs, it need do hw scale align to
adapt hpu platform to get better accuracy and performance. I add the
class `ConvertScaleHwAlign` in
vllm-hpu-extension(HabanaAI/vllm-hpu-extension#118)
and call it in vllm-fork. the class also include the device check and
make factor first if the device is G2.
it is used for loading the models in this link.
https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127

---------

Signed-off-by: changwangss <[email protected]>
bkowalskiINTEL pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 14, 2025
https://jira.habana-labs.com/browse/SW-207506 the scales provided by
neuralmagic fp8 model method are maxabs, it need do hw scale align to
adapt hpu platform to get better accuracy and performance. I add the
class `ConvertScaleHwAlign` in
vllm-hpu-extension(HabanaAI/vllm-hpu-extension#118)
and call it in vllm-fork. the class also include the device check and
make factor first if the device is G2.
it is used for loading the models in this link.
https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127

---------

Signed-off-by: changwangss <[email protected]>
michalkuligowski added a commit that referenced this pull request Apr 15, 2025
Kacper-Pietkun pushed a commit that referenced this pull request Apr 15, 2025
* add ScaleToHwAligned for fp8 vllm model loading

Signed-off-by: changwangss <[email protected]>

* remove import

Signed-off-by: changwangss <[email protected]>

* improve structure

Signed-off-by: changwangss <[email protected]>

---------

Signed-off-by: changwangss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants