Replies: 5 comments 6 replies
-
@dmauler1 are you able to provide: Docker run command for your vLLM containervLLM launch commandRunning vLLM with gdbas well as output from:
OUTPUT:
Building Hello World GPU Example App https://github.com/lamikr/rocm_sdk_builder
You should expect to see a following output if the application can communicate with your GPU.
Simple CPU vs GPU benchmarks Very simple benchmark that shows how to run the same math operation both in the CPU and GPU is available on as pytorch program which can be run on jupyter notebook. On CPU the expected time is usually around 20-30 seconds. It can be executed with these commands:
|
Beta Was this translation helpful? Give feedback.
-
Docker command VLLM command I'm unclear what VLLM GDB is but if I can be pointed in its direction I'll run it. Here is the output from rocminfo `root@8c5b2f75969b:/# rocminfo
|
Beta Was this translation helpful? Give feedback.
-
Also worth noting I had to run the lspci command on the host, the container doesn't have the command. |
Beta Was this translation helpful? Give feedback.
-
it is also likely your error is more appropriately channeled to the vllm team as it seems you have build roc successfully unless you are getting errors in other applications, which the hello_world.sh and python script seems to indicate is not the case. |
Beta Was this translation helpful? Give feedback.
-
I unfortunately I have not ever tried to run vllm with multiple GPUs. I could in theory try to use that with my framework 16 laptop which has both the gfx1102/7700S discrete gpu and gfx1103 igpu. I am still in process for finalizing the rocm-633 support, and I would say that it's propably better to test again the multi-gpu support with this release once it's finished. I have now builded the rsb 6.3.3 for gfx1030, gfx1150 and gfx1201 gpus on Fedora and Mageia distributions. I however need to test the build process little bit more as I have still needed to fix some build breaks while building myself. (so not tested yet with clean build without needing to make any changes). Once I have time, all the extra apps are also still waiting the update to latest release versions, so far I have only updated the llama-cpp from those. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
First of all thank you so much for putting this project together, being able to run container lamikr/rocm_sdk_builder:612_01_cdna and vllm just work is awesome. So far everything seems to work but if I try to use multi GPU --tensor-parallel-size 2 it eventually errors out with the following exceptions.
`(VllmWorkerProcess pid=2700) INFO 03-13 02:53:42 model_runner.py:1116] Loading model weights took 1.4478 GB
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.46s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.46s/it]
INFO 03-13 02:53:44 model_runner.py:1116] Loading model weights took 1.4478 GB
INFO 03-13 02:53:44 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20250313-025344.pkl...
INFO 03-13 02:53:44 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20250313-025344.pkl.
ERROR 03-13 02:53:44 engine.py:387] BackendCompilerFailed.init() missing 1 required positional argument: 'inner_exception'
`
Is there a more correct way to run vllm with multi gpu support or did I trip over a bug?
Thanks for any insight!
Beta Was this translation helpful? Give feedback.
All reactions