We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
host_to_device_memcpy_sm
device_to_host_memcpy_sm
Hi! When we tested host_to-vice_cemcpy_sm and Device_to-host_cemcpy_sm separately on the H100 cluster, we obtained two completely different values
host_to-vice_cemcpy_sm
Device_to-host_cemcpy_sm
Running host_to_device_memcpy_sm. memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s) 0 1 2 3 4 5 6 7 0 35.19 35.25 35.30 35.03 35.25 35.32 35.39 35.06
Running device_to_host_memcpy_sm. memcpy SM CPU(row) <- GPU(column) bandwidth (GB/s) 0 1 2 3 4 5 6 7 0 52.77 52.77 52.77 52.78 52.76 52.77 52.78 52.77
Actually, they should be close values. What could be causing this?
The text was updated successfully, but these errors were encountered:
I got the similar test results with H20 and H800
Sorry, something went wrong.
hey,how did you build the nvbandwidth on H100? i encounter build failure on rtx4090, it says sm89 not supported
@ywxc1997 Could you let me know if you got your questions answered? if the reason as below?
No branches or pull requests
Hi!
When we tested
host_to-vice_cemcpy_sm
andDevice_to-host_cemcpy_sm
separately on the H100 cluster, we obtained two completely different valuesActually, they should be close values.
What could be causing this?
The text was updated successfully, but these errors were encountered: