-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA/HIP Backend Refactor #839
Comments
I would unify On the topic of implementations that would be specific to hip or cuda, I am not aware of such things. |
I think a smaller first step could be refactoring the code generation backends to share the kernels that other backends use. Currently there are some minor differences, but I don't know why those differences were added. |
This is a good point, if we gather code, then we have to document in the same place the reasons for the differences. My proposal above is not a "first step" but a goal, I guess the different tasks would be:
|
For the long term health of these backends, I think we should do a cleanup and refactor in the near term. Combining kernels across the CUDA and HIP backends should come after this near term refactor. I don't know enough about performance studies between CUDA and HIP to attempt combining pieces of these two backend 'families' myself, but I do know enough to refactor the backend design into something cleaner. Proposed near term refactor roadmap: PR 2
PR 3
PR 4+
|
I stalled out and focused on some Ratel work before wrapping up the final stage of this issue. @jedbrown I think this last stage of the GPU |
As a result of how they were designed, there is a bunch of code duplication in the CUDA backends, and as a result of this CUDA duplication, the HIP backends inherited this same code duplication.
We should make a PR or series of PRs that is actually designed to refactor and reduce code duplication across these backends.
What code could/should be combined?
Where and how do we need to allow for differences between the backends and platforms?
Where do we want to test to prevent regression from aggressive or incorrect amalgamation between CUDA and HIP?
What needs to be done to allow the code generation backends (
gpu/cuda/gen
andgpu/hip/gen
) to share kernels from the other backends?@tcew, @jedbrown, any anyone else who I'm missing but is interested, please feel free to jump into this issue or the discussion with thoughts I'm overlooking.
The text was updated successfully, but these errors were encountered: