Llama 3.1 405B on Gaudi #646
Unanswered
ppatel-eng
asked this question in
Q&A
Replies: 2 comments
-
Hi @ppatel-eng for the time being we don't provide a multinode solution. Did you have any issues with running model calibration procedure? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Please use the https://github.com/HabanaAI/vllm-hpu-extension/tree/main/calibration to quantize the model. Make sure you are using the latest vllm and vllm-hpu-extenstion version with gaudi pytorch 1.19 image |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We are trying to run Llama 3.1 405B on Gaudi and are running into memory constraints when following the guide below on 8 Gaudi 2 HPUs. Our end goal is to use vllm-fork to serve llama 3.1 405b, ideally with as little quantization as possible.
https://github.com/HabanaAI/vllm-hpu-extension/blob/main/calibration/README.md
Beta Was this translation helpful? Give feedback.
All reactions