Replies: 0 comments 1 reply
-
Looking at your code and comparing the generated PTX, it looks like when using a type other than https://godbolt.org/z/TGY1qs8x8 An infinite loop results in undefined behavior, so the compiler is free to throw away the kernel entirely, which is what is happening here. I don't know why it doesn't do this for For reference, the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
So this blocks the process on CPU and acts as it would be a sync for_each. If I remove the half or replace with float, it works as expected.
Beta Was this translation helpful? Give feedback.
All reactions