Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host_to_device_memcpy_sm low than device_to_host_memcpy_sm #24

Open
ywxc1997 opened this issue Sep 11, 2024 · 3 comments
Open

host_to_device_memcpy_sm low than device_to_host_memcpy_sm #24

ywxc1997 opened this issue Sep 11, 2024 · 3 comments

Comments

@ywxc1997
Copy link

Hi!
When we tested host_to-vice_cemcpy_sm and Device_to-host_cemcpy_sm separately on the H100 cluster, we obtained two completely different values

Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 35.19 35.25 35.30 35.03 35.25 35.32 35.39 35.06

Running device_to_host_memcpy_sm.
memcpy SM CPU(row) <- GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 52.77 52.77 52.77 52.78 52.76 52.77 52.78 52.77

Actually, they should be close values.
What could be causing this?

@jessicaSGH
Copy link

I got the similar test results with H20 and H800

@zhangvia
Copy link

hey,how did you build the nvbandwidth on H100? i encounter build failure on rtx4090, it says sm89 not supported

@eomhyeonpil
Copy link

eomhyeonpil commented Oct 28, 2024

@ywxc1997
Could you let me know if you got your questions answered?
if the reason as below?
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants