Does HF TGI support Multi Node -Multi GPU server set up ? #1561
Unanswered
ansSanthoshM
asked this question in
Q&A
Replies: 3 comments 6 replies
-
Please let me know developers comment for this. This would help me to decide next course of action. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi! we would like to know the same! |
Beta Was this translation helpful? Give feedback.
0 replies
-
I have it working on a single-node multi-GPU. Then, for scaling, I'm planning to run a load balancer over multiple instances. |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Team,
I have two machines, each machine has 4 NVIDIA GPUs, each GPU has 4GB RAM, so each machine has 184GB of VRAM.
Two machines are made as a cluster, now the cluster has 8GPUs and total 368GB of VRAM.
Now i want to load two LLM models on these cluster 1) Llama2-70B-Chat 2)Llama2-70B-Code, Each of these LLM consume 168GB of VRAM, to load both the models i need total 336 GB of VRAM. So i am thinking to use MultiNode-MulitGPU configuration server i.e 2 nodes each node has 4 GPUs.
Is it possible to make TGI server on this cluster configuration ? So that i can create two docker container end points for each of the LLM but both share common harware.
Beta Was this translation helpful? Give feedback.
All reactions