You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
./nvbandwidth
nvbandwidth Version: v0.2
Built from Git version: 6cefdda
NOTE: This tool reports current measured bandwidth on your system.
Additional system-specific tuning may be required to achieve maximal peak bandwidth.
CUDA Runtime Version: 12030
CUDA Driver Version: 12030
Driver Version: 545.29.06
Running device_to_device_memcpy_read_ce.
Invalid value when checking the pattern at <0x7fcd8c000000>
Current offset [ 0/67108864]
Aborted (core dumped)
The text was updated successfully, but these errors were encountered:
The consumer cards from Nvidia (e.g. Ada Lovelace) don't support peer-to-peer communication and they also ditched NVlink. So, I don't think you would be able to run this test, it fails in device_to_device_memcpy... as you can see. Nvidia confirms it, too: https://www.tomshardware.com/news/nvidia-confirms-geforce-cards-lack-p2p-support
It will cause a massive slowdown, though, as all data will be routed through CPU RAM. And it also doesn't seem to have any effect in nvbandwidth. All in all, a terrible mistake to invest in a worsktation with >1 RTX 4090, which I also learned the hard way.
NOTE: This tool reports current measured bandwidth on your system.
Additional system-specific tuning may be required to achieve maximal peak bandwidth.
CUDA Runtime Version: 12030
CUDA Driver Version: 12030
Driver Version: 545.29.06
Device 0: NVIDIA GeForce RTX 4090
Device 1: NVIDIA GeForce RTX 4090
Device 2: NVIDIA GeForce RTX 4090
Device 3: NVIDIA GeForce RTX 4090
Device 4: NVIDIA GeForce RTX 4090
Device 5: NVIDIA GeForce RTX 4090
Device 6: NVIDIA GeForce RTX 4090
Device 7: NVIDIA GeForce RTX 4090
Running host_to_device_memcpy_ce.
memcpy CE CPU(row) -> GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 25.50 24.26 23.90 26.30 26.28 26.38 26.09 26.31
SUM host_to_device_memcpy_ce 205.04
Running device_to_host_memcpy_ce.
memcpy CE CPU(row) <- GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 26.77 26.47 26.25 26.63 27.10 27.11 27.11 26.83
SUM device_to_host_memcpy_ce 214.28
Running host_to_device_bidirectional_memcpy_ce.
memcpy CE CPU(row) <-> GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 11.24 14.42 16.15 9.78 18.82 16.46 13.65 19.24
SUM host_to_device_bidirectional_memcpy_ce 119.76
Running device_to_host_bidirectional_memcpy_ce.
memcpy CE CPU(row) <-> GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 22.35 21.16 21.04 24.01 21.34 21.40 21.41 21.53
SUM device_to_host_bidirectional_memcpy_ce 174.26
Running device_to_device_memcpy_read_ce.
Invalid value when checking the pattern at <0x7fcd8c000000>
Current offset [ 0/67108864]
Aborted (core dumped)
The text was updated successfully, but these errors were encountered: