Multi GPU support #225

dmauler1 · 2025-03-13T03:03:21Z

dmauler1
Mar 13, 2025

Hello,

First of all thank you so much for putting this project together, being able to run container lamikr/rocm_sdk_builder:612_01_cdna and vllm just work is awesome. So far everything seems to work but if I try to use multi GPU --tensor-parallel-size 2 it eventually errors out with the following exceptions.

`(VllmWorkerProcess pid=2700) INFO 03-13 02:53:42 model_runner.py:1116] Loading model weights took 1.4478 GB
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.46s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.46s/it]

INFO 03-13 02:53:44 model_runner.py:1116] Loading model weights took 1.4478 GB
INFO 03-13 02:53:44 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20250313-025344.pkl...
INFO 03-13 02:53:44 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20250313-025344.pkl.
ERROR 03-13 02:53:44 engine.py:387] BackendCompilerFailed.init() missing 1 required positional argument: 'inner_exception'
`
Is there a more correct way to run vllm with multi gpu support or did I trip over a bug?

Thanks for any insight!

chboishabba · 2025-03-13T05:50:45Z

chboishabba
Mar 13, 2025

@dmauler1 are you able to provide:

Docker run command for your vLLM container

vLLM launch command

Running vLLM with gdb

as well as output from:

rocminfo

# AMD System Management Interface (SMI) metrics
amd-smi

# List PCI devices - look for your AMD GPUs
lspci | grep VGA

# Python code to check CUDA devices (save as a python file, e.g., check_cuda.py, and run with `python check_cuda.py`)
python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('Number of CUDA devices:', torch.cuda.device_count()); [print('Device ' + str(i) + ': ' + torch.cuda.get_device_name(i)) for i in range(torch.cuda.device_count())]"

OUTPUT:

CUDA available: True
Number of CUDA devices: 1
Device 0: AMD Radeon RX 580 Series

Building Hello World GPU Example App https://github.com/lamikr/rocm_sdk_builder
This will show how to build from source and then run the hello world type example application from document directory. Application loads HIP kernel to your GPU, sends data from the userspace to GPU where the kernel changes it and sends back to userspace. Userspace will then print the received data.


# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
# ./build.sh

You should expect to see a following output if the application can communicate with your GPU.

./hello_world
 System minor: 3
 System major: 10
 Agent name: AMD Radeon Graphics
Kernel input: GdkknVnqkc
Executing GPU kernel task to increases each input character by one...
Kernel output: HelloWorld
Output string matched with the expected text: HelloWorld
Test ok!

Simple CPU vs GPU benchmarks

Very simple benchmark that shows how to run the same math operation both in the CPU and GPU is available on as pytorch program which can be run on jupyter notebook. On CPU the expected time is usually around 20-30 seconds. It can be executed with these commands:

# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/pytorch
# ./pytorch_simple_cpu_vs_gpu_benchmark.sh

0 replies

dmauler1 · 2025-03-14T00:04:00Z

dmauler1
Mar 14, 2025
Author

Docker command
docker run -it --name=rocm --device=/dev/kfd --device=/dev/dri --group-add video docker.io/lamikr/rocm_sdk_builder:612_01_cdna

VLLM command
vllm serve Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 2

I'm unclear what VLLM GDB is but if I can be pointed in its direction I'll run it.

Here is the output from rocminfo

`root@8c5b2f75969b:/# rocminfo
ROCk module version 6.10.5 is loaded

HSA System Attributes

Runtime Version: 1.1
Runtime Ext Version: 1.4
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES

==========
HSA Agents

Agent 1

Name: AMD Ryzen 9 5950X 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 5950X 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3400
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65758968(0x3eb66f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65758968(0x3eb66f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65758968(0x3eb66f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:

Agent 2

Name: gfx906
Uuid: GPU-1616696172e17d3d
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26273(0x66a1)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1800
BDFID: 1792
Internal Node ID: 1
Compute Unit: 64
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 472
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32

Agent 3

Name: gfx906
Uuid: GPU-161890e172e17d3d
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26273(0x66a1)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1800
BDFID: 3072
Internal Node ID: 2
Compute Unit: 64
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 472
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done *** `

Output from adm-smi

`root@8c5b2f75969b:/# amd-smi metric
GPU: 0
USAGE: N/A
POWER:
SOCKET_POWER: N/A
GFX_VOLTAGE: N/A mV
SOC_VOLTAGE: N/A mV
MEM_VOLTAGE: N/A mV
POWER_MANAGEMENT: DISABLED
THROTTLE_STATUS: N/A
CLOCK:
GFX_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_1:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_2:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_3:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_4:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_5:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_6:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_7:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
MEM_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_1:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_2:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_3:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_1:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_2:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_3:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
TEMPERATURE:
EDGE: 25 °C
HOTSPOT: 26 °C
MEM: 25 °C
PCIE:
WIDTH: N/A
SPEED: N/A
BANDWIDTH: N/A
REPLAY_COUNT: N/A
L0_TO_RECOVERY_COUNT: N/A
REPLAY_ROLL_OVER_COUNT: N/A
NAK_SENT_COUNT: N/A
NAK_RECEIVED_COUNT: N/A
CURRENT_BANDWIDTH_SENT: 0 Mb/s
CURRENT_BANDWIDTH_RECEIVED: 0 Mb/s
MAX_PACKET_SIZE: 128 B
ECC:
TOTAL_CORRECTABLE_COUNT: 0
TOTAL_UNCORRECTABLE_COUNT: 0
TOTAL_DEFERRED_COUNT: 0
CACHE_CORRECTABLE_COUNT: 0
CACHE_UNCORRECTABLE_COUNT: 0
ECC_BLOCKS:
UMC:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
SDMA:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
GFX:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
MMHUB:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
PCIE_BIF:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
HDP:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
FAN:
SPEED: 37
MAX: 255
RPM: 0
USAGE: 14.51 %
VOLTAGE_CURVE: N/A
OVERDRIVE: 0 %
PERF_LEVEL: AMDSMI_DEV_PERF_LEVEL_AUTO
XGMI_ERR: N/A
ENERGY: N/A
MEM_USAGE:
TOTAL_VRAM: 32752 MB
USED_VRAM: 10 MB
FREE_VRAM: 32742 MB
TOTAL_VISIBLE_VRAM: 32752 MB
USED_VISIBLE_VRAM: 10 MB
FREE_VISIBLE_VRAM: 32742 MB
TOTAL_GTT: 32108 MB
USED_GTT: 14 MB
FREE_GTT: 32094 MB

GPU: 1
USAGE: N/A
POWER:
SOCKET_POWER: N/A
GFX_VOLTAGE: N/A mV
SOC_VOLTAGE: N/A mV
MEM_VOLTAGE: N/A mV
POWER_MANAGEMENT: DISABLED
THROTTLE_STATUS: N/A
CLOCK:
GFX_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_1:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_2:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_3:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_4:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_5:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_6:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
GFX_7:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
MEM_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_1:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_2:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
VCLK_3:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_0:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_1:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_2:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
DCLK_3:
CLK: N/A
MIN_CLK: N/A
MAX_CLK: N/A
CLK_LOCKED: N/A
DEEP_SLEEP: N/A
TEMPERATURE:
EDGE: 33 °C
HOTSPOT: 34 °C
MEM: 31 °C
PCIE:
WIDTH: N/A
SPEED: N/A
BANDWIDTH: N/A
REPLAY_COUNT: N/A
L0_TO_RECOVERY_COUNT: N/A
REPLAY_ROLL_OVER_COUNT: N/A
NAK_SENT_COUNT: N/A
NAK_RECEIVED_COUNT: N/A
CURRENT_BANDWIDTH_SENT: 0 Mb/s
CURRENT_BANDWIDTH_RECEIVED: 0 Mb/s
MAX_PACKET_SIZE: 256 B
ECC:
TOTAL_CORRECTABLE_COUNT: 0
TOTAL_UNCORRECTABLE_COUNT: 0
TOTAL_DEFERRED_COUNT: 0
CACHE_CORRECTABLE_COUNT: 0
CACHE_UNCORRECTABLE_COUNT: 0
ECC_BLOCKS:
UMC:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
SDMA:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
GFX:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
MMHUB:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
PCIE_BIF:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
HDP:
CORRECTABLE_COUNT: 0
UNCORRECTABLE_COUNT: 0
DEFERRED_COUNT: 0
FAN:
SPEED: 37
MAX: 255
RPM: 0
USAGE: 14.51 %
VOLTAGE_CURVE: N/A
OVERDRIVE: 0 %
PERF_LEVEL: AMDSMI_DEV_PERF_LEVEL_AUTO
XGMI_ERR: N/A
ENERGY: N/A
MEM_USAGE:
TOTAL_VRAM: 32752 MB
USED_VRAM: 10 MB
FREE_VRAM: 32742 MB
TOTAL_VISIBLE_VRAM: 32752 MB
USED_VISIBLE_VRAM: 10 MB
FREE_VISIBLE_VRAM: 32742 MB
TOTAL_GTT: 32108 MB
USED_GTT: 14 MB
FREE_GTT: 32094 MB
`

lspci | grep VGA doesn't return anything but the following command does

lspci | grep Radeon 07:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB] 0c:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB]

Output for python cuda check

CUDA available: True Number of CUDA devices: 2 Device 0: AMD Radeon Graphics Device 1: AMD Radeon Graphics

Hello world example app output

./hello_world System minor: 0 System major: 9 Agent name: AMD Radeon Graphics Kernel input: GdkknVnqkc Expecting that kernel increases each character from input string by one Kernel output string: HelloWorld Output string matched with HelloWorld Test ok!

Output results from pytorch_cpu_vs_gpu_simple_benchmark.sh
Benchmarking CPU and GPUs Pytorch version: 2.4.1 ROCM HIP version: 6.1.40093-1eab16311 Device: AMD Ryzen 9 5950X 16-Core Processor 'CPU time: 0.185 sec Device: AMD Radeon Graphics 'GPU time: 0.491 sec Device: AMD Radeon Graphics 'GPU time: 0.372 sec Benchmark ready

2 replies

chboishabba Mar 14, 2025

you should be able to

gdb  vllm serve Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 2

You can check docs or ask llm how to use, by recollection you run above command, then type run, it'll ask something about like debugd you can say yes or no, it'll try and run, and crash, and when it crashes, type bt, then press enter and it will give you a backtrace (bt) which helps us tell what's going on a bit better.

dmauler1 Mar 15, 2025
Author

Ok I figured out how to get gdb working, here is the output assuming I did things correctly.

`root@8c5b2f75969b:/# gdb -ex r --args python /opt/rocm_sdk_612/bin/vllm serve Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 2
GNU gdb (GDB) 16.0.50.20240817-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Starting program: /opt/rocm_sdk_612/bin/python /opt/rocm_sdk_612/bin/vllm serve Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 2
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x74cf3a4006c0 (LWP 638)]
[New Thread 0x74cf39a006c0 (LWP 639)]
[New Thread 0x74cf310006c0 (LWP 640)]
[New Thread 0x74cf206006c0 (LWP 641)]
[New Thread 0x74cf1fc006c0 (LWP 642)]
[New Thread 0x74cf172006c0 (LWP 643)]
[New Thread 0x74cf0e8006c0 (LWP 644)]
[New Thread 0x74cf05e006c0 (LWP 645)]
[New Thread 0x74cef54006c0 (LWP 646)]
[New Thread 0x74ceeca006c0 (LWP 647)]
[New Thread 0x74ceec0006c0 (LWP 648)]
[New Thread 0x74cee36006c0 (LWP 649)]
[New Thread 0x74cedac006c0 (LWP 650)]
[New Thread 0x74ced22006c0 (LWP 651)]
[New Thread 0x74cec98006c0 (LWP 652)]
[New Thread 0x74ceb5a006c0 (LWP 653)]
[New Thread 0x74ceb50006c0 (LWP 654)]
[New Thread 0x74ceb26006c0 (LWP 655)]
[New Thread 0x74ceafc006c0 (LWP 656)]
[New Thread 0x74cead2006c0 (LWP 657)]
[New Thread 0x74cea88006c0 (LWP 658)]
[New Thread 0x74cea7e006c0 (LWP 659)]
[New Thread 0x74cea54006c0 (LWP 660)]
[New Thread 0x74cea2a006c0 (LWP 661)]
[New Thread 0x74cea00006c0 (LWP 662)]
[New Thread 0x74ce9d6006c0 (LWP 663)]
[New Thread 0x74ce9ac006c0 (LWP 664)]
[New Thread 0x74ce982006c0 (LWP 665)]
[New Thread 0x74ce938006c0 (LWP 666)]
[New Thread 0x74ce92e006c0 (LWP 667)]
[New Thread 0x74ce8e4006c0 (LWP 668)]
[New Thread 0x74ce8da006c0 (LWP 669)]
[New Thread 0x74ce8b0006c0 (LWP 670)]
[New Thread 0x74ce886006c0 (LWP 671)]
[New Thread 0x74ce85c006c0 (LWP 672)]
[New Thread 0x74ce832006c0 (LWP 673)]
[New Thread 0x74ce808006c0 (LWP 674)]
[New Thread 0x74ce7de006c0 (LWP 675)]
[New Thread 0x74ce7b4006c0 (LWP 676)]
[New Thread 0x74ce78a006c0 (LWP 677)]
[New Thread 0x74ce760006c0 (LWP 678)]
[New Thread 0x74ce716006c0 (LWP 679)]
[New Thread 0x74ce6ec006c0 (LWP 680)]
[New Thread 0x74ce6c2006c0 (LWP 681)]
[New Thread 0x74ce6b8006c0 (LWP 682)]
[New Thread 0x74ce66e006c0 (LWP 683)]
[New Thread 0x74ce04e006c0 (LWP 684)]
[New Thread 0x74ce044006c0 (LWP 685)]
[New Thread 0x74ce03a006c0 (LWP 686)]
[New Thread 0x74ce030006c0 (LWP 687)]
[New Thread 0x74ce026006c0 (LWP 688)]
[New Thread 0x74ce01c006c0 (LWP 689)]
[New Thread 0x74ce012006c0 (LWP 690)]
[New Thread 0x74ce008006c0 (LWP 691)]
[New Thread 0x74cdffe006c0 (LWP 692)]
[New Thread 0x74cdff4006c0 (LWP 693)]
[New Thread 0x74cdfea006c0 (LWP 694)]
[New Thread 0x74cdfe0006c0 (LWP 695)]
[New Thread 0x74cdfd6006c0 (LWP 696)]
[New Thread 0x74cdfcc006c0 (LWP 697)]
[New Thread 0x74cdfc2006c0 (LWP 698)]
[New Thread 0x74cdfb8006c0 (LWP 699)]
[New Thread 0x74cdfae006c0 (LWP 700)]
[New Thread 0x74cdfa4006c0 (LWP 701)]
[New Thread 0x74cdf9a006c0 (LWP 702)]
[New Thread 0x74cdf90006c0 (LWP 703)]
[New Thread 0x74cdf86006c0 (LWP 704)]
[New Thread 0x74cdf7c006c0 (LWP 705)]
[New Thread 0x74cdf72006c0 (LWP 706)]
[New Thread 0x74cdf68006c0 (LWP 707)]
[New Thread 0x74cdf5e006c0 (LWP 708)]
[New Thread 0x74cdf54006c0 (LWP 709)]
[New Thread 0x74cdf4a006c0 (LWP 710)]
[New Thread 0x74cdf40006c0 (LWP 711)]
[New Thread 0x74cdf36006c0 (LWP 712)]
[New Thread 0x74cdf2c006c0 (LWP 713)]
[New Thread 0x74cdf22006c0 (LWP 714)]
[New Thread 0x74cdec2006c0 (LWP 715)]
INFO 03-15 15:10:39 init.py:183] Automatically detected platform rocm.
WARNING 03-15 15:10:39 rocm.py:31] fork method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to spawn instead.
[Detaching after vfork from child process 716]
[Detaching after vfork from child process 717]
[Detaching after vfork from child process 718]
INFO 03-15 15:10:40 api_server.py:838] vLLM API server version 0.7.2.dev5+g8bcfc589
INFO 03-15 15:10:40 api_server.py:839] args: Namespace(subparser='serve', model_tag='Qwen/Qwen2.5-1.5B-Instruct', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='Qwen/Qwen2.5-1.5B-Instruct', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function serve at 0x74cde81a9d00>)
[Detaching after vfork from child process 721]
[Detaching after vfork from child process 722]
INFO 03-15 15:10:40 api_server.py:204] Started engine process with PID 722
[Detaching after vfork from child process 769]
INFO 03-15 15:10:43 init.py:183] Automatically detected platform rocm.
INFO 03-15 15:10:46 config.py:526] This model supports multiple tasks: {'embed', 'reward', 'score', 'classify', 'generate'}. Defaulting to 'generate'.
INFO 03-15 15:10:46 config.py:1383] Defaulting to use mp for distributed inference
INFO 03-15 15:10:46 config.py:1413] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
[New Thread 0x74cdd90006c0 (LWP 938)]
[New Thread 0x74cdd36006c0 (LWP 939)]
[New Thread 0x74cdd2c006c0 (LWP 940)]
INFO 03-15 15:10:50 config.py:526] This model supports multiple tasks: {'score', 'reward', 'generate', 'embed', 'classify'}. Defaulting to 'generate'.
INFO 03-15 15:10:50 config.py:1383] Defaulting to use mp for distributed inference
INFO 03-15 15:10:50 config.py:1413] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
INFO 03-15 15:10:50 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.2.dev5+g8bcfc589) with config: model='Qwen/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=Qwen/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
WARNING 03-15 15:10:50 multiproc_worker_utils.py:298] Reducing Torch parallelism from 16 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 03-15 15:10:50 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 03-15 15:10:51 rocm.py:87] None is not supported in AMD GPUs.
INFO 03-15 15:10:51 rocm.py:88] Using ROCmFlashAttention backend.
INFO 03-15 15:10:54 init.py:183] Automatically detected platform rocm.
(VllmWorkerProcess pid=1010) INFO 03-15 15:10:54 multiproc_worker_utils.py:227] Worker ready; awaiting tasks
_ON_GCN5: True
(VllmWorkerProcess pid=1010) INFO 03-15 15:10:55 rocm.py:87] None is not supported in AMD GPUs.
(VllmWorkerProcess pid=1010) INFO 03-15 15:10:55 rocm.py:88] Using ROCmFlashAttention backend.
(VllmWorkerProcess pid=1010) _ON_GCN5: True
INFO 03-15 15:10:59 utils.py:938] Found nccl from library librccl.so.1
(VllmWorkerProcess pid=1010) INFO 03-15 15:10:59 utils.py:938] Found nccl from library librccl.so.1
INFO 03-15 15:10:59 pynccl.py:67] vLLM is using nccl==2.18.6
(VllmWorkerProcess pid=1010) INFO 03-15 15:10:59 pynccl.py:67] vLLM is using nccl==2.18.6
INFO 03-15 15:10:59 shm_broadcast.py:256] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_a8a1cef6'), local_subscribe_port=54937, remote_subscribe_port=None)
INFO 03-15 15:10:59 model_runner.py:1111] Starting to load model Qwen/Qwen2.5-1.5B-Instruct...
WARNING 03-15 15:10:59 rocm.py:145] Model architecture 'Qwen2ForCausalLM' is partially supported by ROCm: Sliding window attention (SWA) is not yet supported in Triton flash attention. For half-precision SWA support, please use CK flash attention by setting VLLM_USE_TRITON_FLASH_ATTN=0
(VllmWorkerProcess pid=1010) INFO 03-15 15:10:59 model_runner.py:1111] Starting to load model Qwen/Qwen2.5-1.5B-Instruct...
(VllmWorkerProcess pid=1010) WARNING 03-15 15:10:59 rocm.py:145] Model architecture 'Qwen2ForCausalLM' is partially supported by ROCm: Sliding window attention (SWA) is not yet supported in Triton flash attention. For half-precision SWA support, please use CK flash attention by setting VLLM_USE_TRITON_FLASH_ATTN=0
INFO 03-15 15:11:00 weight_utils.py:251] Using model weights format ['.safetensors']
INFO 03-15 15:11:00 weight_utils.py:296] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
(VllmWorkerProcess pid=1010) INFO 03-15 15:11:00 weight_utils.py:251] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=1010) INFO 03-15 15:11:00 weight_utils.py:296] No model.safetensors.index.json found in remote.
(VllmWorkerProcess pid=1010) INFO 03-15 15:11:01 model_runner.py:1116] Loading model weights took 1.4478 GB
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.40s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.40s/it]

INFO 03-15 15:11:02 model_runner.py:1116] Loading model weights took 1.4478 GB
INFO 03-15 15:11:03 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20250315-151103.pkl...
INFO 03-15 15:11:03 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20250315-151103.pkl.
ERROR 03-15 15:11:03 engine.py:387] BackendCompilerFailed.init() missing 1 required positional argument: 'inner_exception'
ERROR 03-15 15:11:03 engine.py:387] Traceback (most recent call last):
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1721, in execute_model
ERROR 03-15 15:11:03 engine.py:387] hidden_or_intermediate_states = model_executable(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 03-15 15:11:03 engine.py:387] return self._call_impl(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 03-15 15:11:03 engine.py:387] return forward_call(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 484, in forward
ERROR 03-15 15:11:03 engine.py:387] hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 170, in call
ERROR 03-15 15:11:03 engine.py:387] return self.forward(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 338, in forward
ERROR 03-15 15:11:03 engine.py:387] hidden_states = self.get_input_embeddings(input_ids)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 323, in get_input_embeddings
ERROR 03-15 15:11:03 engine.py:387] return self.embed_tokens(input_ids)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 03-15 15:11:03 engine.py:387] return self._call_impl(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 03-15 15:11:03 engine.py:387] return forward_call(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 404, in forward
ERROR 03-15 15:11:03 engine.py:387] masked_input, input_mask = get_masked_input_and_mask(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
ERROR 03-15 15:11:03 engine.py:387] return fn(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1116, in call
ERROR 03-15 15:11:03 engine.py:387] return self._torchdynamo_orig_callable(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 948, in call
ERROR 03-15 15:11:03 engine.py:387] result = self._inner_convert(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 472, in call
ERROR 03-15 15:11:03 engine.py:387] return _compile(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
ERROR 03-15 15:11:03 engine.py:387] return StrobelightCompileTimeProfiler.profile_compile_time(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwds)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
ERROR 03-15 15:11:03 engine.py:387] guarded_code = compile_inner(code, one_graph, hooks, transform)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
ERROR 03-15 15:11:03 engine.py:387] r = func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
ERROR 03-15 15:11:03 engine.py:387] out_code = transform_code_object(code, transform)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
ERROR 03-15 15:11:03 engine.py:387] transformations(instructions, code_options)
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
ERROR 03-15 15:11:03 engine.py:387] return fn(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
ERROR 03-15 15:11:03 engine.py:387] tracer.run()
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
ERROR 03-15 15:11:03 engine.py:387] super().run()
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
ERROR 03-15 15:11:03 engine.py:387] while self.step():
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
ERROR 03-15 15:11:03 engine.py:387] self.dispatch_table[inst.opcode](self, inst)
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
ERROR 03-15 15:11:03 engine.py:387] self._return(inst)
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
ERROR 03-15 15:11:03 engine.py:387] self.output.compile_subgraph(
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1123, in compile_subgraph
ERROR 03-15 15:11:03 engine.py:387] self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwds)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
ERROR 03-15 15:11:03 engine.py:387] compiled_fn = self.call_user_compiler(gm)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
ERROR 03-15 15:11:03 engine.py:387] r = func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
ERROR 03-15 15:11:03 engine.py:387] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/dynamo/output_graph.py", line 1390, in call_user_compiler
ERROR 03-15 15:11:03 engine.py:387] compiled_fn = compiler_fn(gm, self.example_inputs())
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/dynamo/repro/after_dynamo.py", line 129, in call
ERROR 03-15 15:11:03 engine.py:387] compiled_gm = compiler_fn(gm, example_inputs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/init.py", line 1951, in call
ERROR 03-15 15:11:03 engine.py:387] return compile_fx(model, inputs, config_patches=self.config)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwds)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
ERROR 03-15 15:11:03 engine.py:387] return aot_autograd(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/backends/common.py", line 69, in call
ERROR 03-15 15:11:03 engine.py:387] cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
ERROR 03-15 15:11:03 engine.py:387] compiled_fn, _ = create_aot_dispatcher_function(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
ERROR 03-15 15:11:03 engine.py:387] r = func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
ERROR 03-15 15:11:03 engine.py:387] compiled_fn, fw_metadata = compiler_fn(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 168, in aot_dispatch_base
ERROR 03-15 15:11:03 engine.py:387] compiled_fw = compiler(fw_module, updated_flat_args)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
ERROR 03-15 15:11:03 engine.py:387] r = func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base
ERROR 03-15 15:11:03 engine.py:387] return inner_compile(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
ERROR 03-15 15:11:03 engine.py:387] inner_compiled_fn = compiler_fn(gm, example_inputs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/debug.py", line 304, in inner
ERROR 03-15 15:11:03 engine.py:387] return fn(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwds)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwds)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
ERROR 03-15 15:11:03 engine.py:387] r = func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner
ERROR 03-15 15:11:03 engine.py:387] compiled_graph = fx_codegen_and_compile(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwds)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile
ERROR 03-15 15:11:03 engine.py:387] compiled_fn = graph.compile_to_fn()
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1751, in compile_to_fn
ERROR 03-15 15:11:03 engine.py:387] return self.compile_to_module().call
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
ERROR 03-15 15:11:03 engine.py:387] r = func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1701, in compile_to_module
ERROR 03-15 15:11:03 engine.py:387] mod = PyCodeCache.load_by_key_path(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 3073, in load_by_key_path
ERROR 03-15 15:11:03 engine.py:387] mod = _reload_python_module(key, path)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in reload_python_module
ERROR 03-15 15:11:03 engine.py:387] exec(code, mod.dict, mod.dict)
ERROR 03-15 15:11:03 engine.py:387] File "/tmp/torchinductor_root/g6/cg664zcnietgha2qjuwpypfgkcwbwc7vnxd4kznwum2mtsozli2s.py", line 45, in
ERROR 03-15 15:11:03 engine.py:387] triton_poi_fused_add_bitwise_and_bitwise_not_bitwise_or_ge_lt_mul_sub_0 = async_compile.triton('triton', '''
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/async_compile.py", line 183, in triton
ERROR 03-15 15:11:03 engine.py:387] kernel.precompile()
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 232, in precompile
ERROR 03-15 15:11:03 engine.py:387] compiled_binary, launcher = self._precompile_config(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 363, in _precompile_config
ERROR 03-15 15:11:03 engine.py:387] ASTSource(
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/triton/compiler/compiler.py", line 82, in init
ERROR 03-15 15:11:03 engine.py:387] raise TypeError("Signature keys must be string")
ERROR 03-15 15:11:03 engine.py:387] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
ERROR 03-15 15:11:03 engine.py:387] TypeError: Signature keys must be string
ERROR 03-15 15:11:03 engine.py:387]
ERROR 03-15 15:11:03 engine.py:387] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
ERROR 03-15 15:11:03 engine.py:387]
ERROR 03-15 15:11:03 engine.py:387]
ERROR 03-15 15:11:03 engine.py:387] You can suppress this exception and fall back to eager by setting:
ERROR 03-15 15:11:03 engine.py:387] import torch._dynamo
ERROR 03-15 15:11:03 engine.py:387] torch._dynamo.config.suppress_errors = True
ERROR 03-15 15:11:03 engine.py:387]
ERROR 03-15 15:11:03 engine.py:387]
ERROR 03-15 15:11:03 engine.py:387] During handling of the above exception, another exception occurred:
ERROR 03-15 15:11:03 engine.py:387]
ERROR 03-15 15:11:03 engine.py:387] Traceback (most recent call last):
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine
ERROR 03-15 15:11:03 engine.py:387] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 121, in from_engine_args
ERROR 03-15 15:11:03 engine.py:387] return cls(ipc_path=ipc_path,
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 73, in init
ERROR 03-15 15:11:03 engine.py:387] self.engine = LLMEngine(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 274, in init
ERROR 03-15 15:11:03 engine.py:387] self._initialize_kv_caches()
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 414, in _initialize_kv_caches
ERROR 03-15 15:11:03 engine.py:387] self.model_executor.determine_num_available_blocks())
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 99, in determine_num_available_blocks
ERROR 03-15 15:11:03 engine.py:387] results = self.collective_rpc("determine_num_available_blocks")
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 305, in collective_rpc
ERROR 03-15 15:11:03 engine.py:387] return self._run_workers(method, *args, **(kwargs or {}))
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 183, in _run_workers
ERROR 03-15 15:11:03 engine.py:387] driver_worker_output = run_method(self.driver_worker, sent_method,
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/utils.py", line 2208, in run_method
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/worker.py", line 228, in determine_num_available_blocks
ERROR 03-15 15:11:03 engine.py:387] self.model_runner.profile_run()
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1236, in profile_run
ERROR 03-15 15:11:03 engine.py:387] self._dummy_run(max_num_batched_tokens, max_num_seqs)
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1347, in _dummy_run
ERROR 03-15 15:11:03 engine.py:387] self.execute_model(model_input, kv_caches, intermediate_tensors)
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 03-15 15:11:03 engine.py:387] return func(*args, **kwargs)
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
ERROR 03-15 15:11:03 engine.py:387] raise type(err)(
ERROR 03-15 15:11:03 engine.py:387] ^^^^^^^^^^
ERROR 03-15 15:11:03 engine.py:387] TypeError: BackendCompilerFailed.init() missing 1 required positional argument: 'inner_exception'
INFO 03-15 15:11:03 multiproc_worker_utils.py:126] Killing local vLLM worker processes
Process SpawnProcess-1:
Traceback (most recent call last):
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1721, in execute_model
hidden_or_intermediate_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 484, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 170, in call
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 338, in forward
hidden_states = self.get_input_embeddings(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 323, in get_input_embeddings
return self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 404, in forward
masked_input, input_mask = get_masked_input_and_mask(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1116, in call
return self._torchdynamo_orig_callable(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 948, in call
result = self._inner_convert(
^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 472, in call
return _compile(
^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
return StrobelightCompileTimeProfiler.profile_compile_time(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
out_code = transform_code_object(code, transform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
transformations(instructions, code_options)
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
tracer.run()
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
super().run()
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
while self.step():
^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
self.dispatch_table[inst.opcode](self, inst)
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
self._return(inst)
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
self.output.compile_subgraph(
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1123, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/dynamo/output_graph.py", line 1390, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/dynamo/repro/after_dynamo.py", line 129, in call
compiled_gm = compiler_fn(gm, example_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/init.py", line 1951, in call
return compile_fx(model, inputs, config_patches=self.config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
return aot_autograd(
^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/backends/common.py", line 69, in call
cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
compiled_fn, _ = create_aot_dispatcher_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
compiled_fn, fw_metadata = compiler_fn(
^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 168, in aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base
return inner_compile(
^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1751, in compile_to_fn
return self.compile_to_module().call
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1701, in compile_to_module
mod = PyCodeCache.load_by_key_path(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 3073, in load_by_key_path
mod = _reload_python_module(key, path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in reload_python_module
exec(code, mod.dict, mod.dict)
File "/tmp/torchinductor_root/g6/cg664zcnietgha2qjuwpypfgkcwbwc7vnxd4kznwum2mtsozli2s.py", line 45, in
triton_poi_fused_add_bitwise_and_bitwise_not_bitwise_or_ge_lt_mul_sub_0 = async_compile.triton('triton', '''
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/async_compile.py", line 183, in triton
kernel.precompile()
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 232, in precompile
compiled_binary, launcher = self._precompile_config(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 363, in _precompile_config
ASTSource(
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/triton/compiler/compiler.py", line 82, in init
raise TypeError("Signature keys must be string")
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
TypeError: Signature keys must be string

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/rocm_sdk_612/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/rocm_sdk_612/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 389, in run_mp_engine
raise e
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 121, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 73, in init
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 274, in init
self._initialize_kv_caches()
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 414, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 99, in determine_num_available_blocks
results = self.collective_rpc("determine_num_available_blocks")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 305, in collective_rpc
return self._run_workers(method, *args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 183, in _run_workers
driver_worker_output = run_method(self.driver_worker, sent_method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/utils.py", line 2208, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/worker.py", line 228, in determine_num_available_blocks
self.model_runner.profile_run()
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1236, in profile_run
self._dummy_run(max_num_batched_tokens, max_num_seqs)
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1347, in _dummy_run
self.execute_model(model_input, kv_caches, intermediate_tensors)
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
raise type(err)(
^^^^^^^^^^
TypeError: BackendCompilerFailed.init() missing 1 required positional argument: 'inner_exception'
[Thread 0x74cdd2c006c0 (LWP 940) exited]
[Thread 0x74cdd36006c0 (LWP 939) exited]
[New Thread 0x74cdd36006c0 (LWP 1086)]
[Thread 0x74cdd36006c0 (LWP 1086) exited]
[Thread 0x74cdd90006c0 (LWP 938) exited]
Traceback (most recent call last):
File "/opt/rocm_sdk_612/bin/vllm", line 8, in
sys.exit(main())
^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/scripts.py", line 202, in main
args.dispatch_function(args)
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/scripts.py", line 42, in serve
uvloop.run(run_server(args))
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/uvloop/init.py", line 105, in run
return runner.run(wrapper())
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 873, in run_server
async with build_async_engine_client(args) as engine_client:
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 134, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/opt/rocm_sdk_612/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 228, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
[Thread 0x74ce66e006c0 (LWP 683) exited]
[Thread 0x74ce6b8006c0 (LWP 682) exited]
[Thread 0x74ce6c2006c0 (LWP 681) exited]
[Thread 0x74ce6ec006c0 (LWP 680) exited]
[Thread 0x74ce716006c0 (LWP 679) exited]
[Thread 0x74ce760006c0 (LWP 678) exited]
[Thread 0x74ce78a006c0 (LWP 677) exited]
[Thread 0x74ce7b4006c0 (LWP 676) exited]
[Thread 0x74ce7de006c0 (LWP 675) exited]
[Thread 0x74ce808006c0 (LWP 674) exited]
[Thread 0x74ce832006c0 (LWP 673) exited]
[Thread 0x74ce85c006c0 (LWP 672) exited]
[Thread 0x74ce886006c0 (LWP 671) exited]
[Thread 0x74ce8b0006c0 (LWP 670) exited]
[Thread 0x74ce8da006c0 (LWP 669) exited]
[Thread 0x74ce8e4006c0 (LWP 668) exited]
[Thread 0x74ce92e006c0 (LWP 667) exited]
[Thread 0x74ce938006c0 (LWP 666) exited]
[Thread 0x74ce982006c0 (LWP 665) exited]
[Thread 0x74ce9ac006c0 (LWP 664) exited]
[Thread 0x74ce9d6006c0 (LWP 663) exited]
[Thread 0x74cea00006c0 (LWP 662) exited]
[Thread 0x74cea2a006c0 (LWP 661) exited]
[Thread 0x74cea54006c0 (LWP 660) exited]
[Thread 0x74cea7e006c0 (LWP 659) exited]
[Thread 0x74cea88006c0 (LWP 658) exited]
[Thread 0x74cead2006c0 (LWP 657) exited]
[Thread 0x74ceafc006c0 (LWP 656) exited]
[Thread 0x74ceb26006c0 (LWP 655) exited]
[Thread 0x74ceb50006c0 (LWP 654) exited]
[Thread 0x74ceb5a006c0 (LWP 653) exited]
[Thread 0x74cdf86006c0 (LWP 704) exited]
[Thread 0x74cdf22006c0 (LWP 714) exited]
[Thread 0x74cdf2c006c0 (LWP 713) exited]
[Thread 0x74cdf36006c0 (LWP 712) exited]
[Thread 0x74cdf40006c0 (LWP 711) exited]
[Thread 0x74cdf4a006c0 (LWP 710) exited]
[Thread 0x74cdf54006c0 (LWP 709) exited]
[Thread 0x74cdf5e006c0 (LWP 708) exited]
[Thread 0x74cdf68006c0 (LWP 707) exited]
[Thread 0x74cdf72006c0 (LWP 706) exited]
[Thread 0x74cdf7c006c0 (LWP 705) exited]
[Thread 0x74cdf90006c0 (LWP 703) exited]
[Thread 0x74cdf9a006c0 (LWP 702) exited]
[Thread 0x74cdfa4006c0 (LWP 701) exited]
[Thread 0x74cdfae006c0 (LWP 700) exited]
[Thread 0x74cdfb8006c0 (LWP 699) exited]
[Thread 0x74cdfc2006c0 (LWP 698) exited]
[Thread 0x74cdfcc006c0 (LWP 697) exited]
[Thread 0x74cdfd6006c0 (LWP 696) exited]
[Thread 0x74cdfe0006c0 (LWP 695) exited]
[Thread 0x74cdfea006c0 (LWP 694) exited]
[Thread 0x74cdff4006c0 (LWP 693) exited]
[Thread 0x74cdffe006c0 (LWP 692) exited]
[Thread 0x74ce008006c0 (LWP 691) exited]
[Thread 0x74ce012006c0 (LWP 690) exited]
[Thread 0x74ce01c006c0 (LWP 689) exited]
[Thread 0x74ce026006c0 (LWP 688) exited]
[Thread 0x74ce030006c0 (LWP 687) exited]
[Thread 0x74ce03a006c0 (LWP 686) exited]
[Thread 0x74ce044006c0 (LWP 685) exited]
[Thread 0x74ce04e006c0 (LWP 684) exited]
[Thread 0x74cf0e8006c0 (LWP 644) exited]
[Thread 0x74cec98006c0 (LWP 652) exited]
[Thread 0x74ced22006c0 (LWP 651) exited]
[Thread 0x74cedac006c0 (LWP 650) exited]
[Thread 0x74cee36006c0 (LWP 649) exited]
[Thread 0x74ceec0006c0 (LWP 648) exited]
[Thread 0x74ceeca006c0 (LWP 647) exited]
[Thread 0x74cef54006c0 (LWP 646) exited]
[Thread 0x74cf05e006c0 (LWP 645) exited]
[Thread 0x74cf172006c0 (LWP 643) exited]
[Thread 0x74cf1fc006c0 (LWP 642) exited]
[Thread 0x74cf206006c0 (LWP 641) exited]
[Thread 0x74cf310006c0 (LWP 640) exited]
[Thread 0x74cf39a006c0 (LWP 639) exited]
[Thread 0x74cf3a4006c0 (LWP 638) exited]
[Thread 0x74d037dc8b80 (LWP 635) exited]
[Thread 0x74cdec2006c0 (LWP 715) exited]
[New process 635]
[Inferior 1 (process 635) exited with code 01]
(gdb) /opt/rocm_sdk_612/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/opt/rocm_sdk_612/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

(gdb)
(gdb)
(gdb) quit
`

dmauler1 · 2025-03-14T00:05:50Z

dmauler1
Mar 14, 2025
Author

Also worth noting I had to run the lspci command on the host, the container doesn't have the command.

2 replies

chboishabba Mar 14, 2025

i presume your container is ubuntu, to check

uname -a

the container may not have it, or you may need to run it as sudo

below link seems to suggest it's in pciutils on ubuntu

https://askubuntu.com/questions/135996/why-my-lscpi-command-not-found

that said,

CUDA available: True Number of CUDA devices: 2 Device 0: AMD Radeon Graphics Device 1: AMD Radeon Graphics

makes it pretty clear that at least torch is picking up both your cards so you may not need to bother with pciutils
your vllm command seems to be correct.

gdb should provide the necessary debug as it seems it's having trouble passing a nested error, probably relating to writing to /tmp/

does your docker setup allow sufficient disk space? i presume you've tried sudo vllm? may need to check vllm write privs. sometimes if you've installed stuff as sudo it wants you to run it as that as well, or add the container's user to relevant privileged group

dmauler1 Mar 15, 2025
Author

Installing pciutils did the trick and you are correct the cards are there.

root@8c5b2f75969b:/# lspci | grep Radeon
07:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB]
0c:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB]

Here's the output from uname
root@8c5b2f75969b:/# uname -a
Linux 8c5b2f75969b 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

I also checked the OS file and it shows ubuntu 24
root@8c5b2f75969b:/# cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"

Btw this is using the docker image on the projects main page
docker pull lamikr/rocm_sdk_builder:612_01_cdna
lamikr/rocm_sdk_builder:612_01_cdna

The container has 304GB of free space and no need to run sudo as the container is setup to run as root.

chboishabba · 2025-03-14T00:22:24Z

chboishabba
Mar 14, 2025

it is also likely your error is more appropriately channeled to the vllm team as it seems you have build roc successfully unless you are getting errors in other applications, which the hello_world.sh and python script seems to indicate is not the case.

2 replies

dmauler1 Mar 15, 2025
Author

So I managed to get rocm to compile in a vanilla ubuntu docker. I tried installing vllm using ./babs.sh -b binfo/extra/ai_tools.blist and the base vllm does work. However, it looks like --tensor-parallel-size still fails with the same error. I'll continue to dig on my end for a solution.

chboishabba Mar 15, 2025

So I managed to get rocm to compile in a vanilla ubuntu docker. I tried installing vllm using ./babs.sh -b binfo/extra/ai_tools.blist and the base vllm does work. However, it looks like --tensor-parallel-size still fails with the same error. I'll continue to dig on my end for a solution.

you may want to try installing from git to confirm if the cause is lamikr's vllm, rocm installation, or an issue with vllm. if you are able, try running with gdb again but running the 'bt' or 'backtrace' command before 'quit' and share results. although that may be unnecessary, as just passing your error through an LLM (I'm not a contrib here):

Model architecture 'Qwen2ForCausalLM' is partially supported by ROCm: Sliding window attention (SWA) is not yet supported in Triton flash attention. For half-precision SWA support, please use CK flash attention by setting VLLM_USE_TRITON_FLASH_ATTN=0.

If this doesn't work, it seems like ollama supports multi-gpu and there is a project for amd in particular. I'm not sure if this fits your use case.
https://github.com/ollama/ollama
https://github.com/likelovewant/ollama-for-amd

But going by that error, if you're not able to get it to cooperate, this seems like an upstream and known issue with Triton rather than this project.

lamikr · 2025-03-25T03:10:23Z

lamikr
Mar 25, 2025
Maintainer

I unfortunately I have not ever tried to run vllm with multiple GPUs. I could in theory try to use that with my framework 16 laptop which has both the gfx1102/7700S discrete gpu and gfx1103 igpu.

I am still in process for finalizing the rocm-633 support, and I would say that it's propably better to test again the multi-gpu support with this release once it's finished. I have now builded the rsb 6.3.3 for gfx1030, gfx1150 and gfx1201 gpus on Fedora and Mageia distributions.

I however need to test the build process little bit more as I have still needed to fix some build breaks while building myself. (so not tested yet with clean build without needing to make any changes). Once I have time, all the extra apps are also still waiting the update to latest release versions, so far I have only updated the llama-cpp from those.

0 replies

Multi GPU support #225

Uh oh!

dmauler1 Mar 13, 2025

Replies: 5 comments · 6 replies

Uh oh!

Uh oh!

chboishabba Mar 13, 2025

Docker run command for your vLLM container

vLLM launch command

Running vLLM with gdb

Uh oh!

dmauler1 Mar 14, 2025 Author

`root@8c5b2f75969b:/# rocminfo ROCk module version 6.10.5 is loaded

HSA System Attributes

========== HSA Agents

Uh oh!

chboishabba Mar 14, 2025

Uh oh!

dmauler1 Mar 15, 2025 Author

Uh oh!

dmauler1 Mar 14, 2025 Author

Uh oh!

chboishabba Mar 14, 2025

Uh oh!

dmauler1 Mar 15, 2025 Author

Uh oh!

Uh oh!

chboishabba Mar 14, 2025

Uh oh!

dmauler1 Mar 15, 2025 Author

Uh oh!

Uh oh!

chboishabba Mar 15, 2025

Uh oh!

Uh oh!

lamikr Mar 25, 2025 Maintainer

dmauler1
Mar 13, 2025

Replies: 5 comments 6 replies

chboishabba
Mar 13, 2025

dmauler1
Mar 14, 2025
Author

`root@8c5b2f75969b:/# rocminfo
ROCk module version 6.10.5 is loaded

==========
HSA Agents

dmauler1 Mar 15, 2025
Author

dmauler1
Mar 14, 2025
Author

dmauler1 Mar 15, 2025
Author

chboishabba
Mar 14, 2025

dmauler1 Mar 15, 2025
Author

lamikr
Mar 25, 2025
Maintainer