-
Notifications
You must be signed in to change notification settings - Fork 225
Avoid vendoring NVTX #1930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.08
Are you sure you want to change the base?
Avoid vendoring NVTX #1930
Conversation
I believe the failures on CUDA 11 are indicative of a problem in include paths. The failures look like this:
but it's caused by the build finding the wrong CCCL headers, probably due to the ordering here:
I'll need to investigate how to make the conda prefix |
I'm fine doing this if we can get it working now. I forgot the plan from NVIDIA/cccl#1939 (comment) when I opened rapidsai/build-planning#183, where I proposed waiting to do cccl until we moved to 3.0. Up to you, but I sort of anticipated things being easier if we waited to do this until we dropped cuda 11 (which I think you are observing now with the build failures). We could reduce the scope of this pr to just nvtx if you prefer. |
@vyasr Yes, I think scoping to just NVTX is a good start. The problem is that the failure on CUDA 11 is actually just an indication that we have the wrong order of include directories. The same problem applies to CUDA 12, it just happens that the CUDA 12 included CCCL is closer to the CCCL (conda package) that we actually want, so it doesn’t fail to build. We are still getting a mix of CCCL versions. I would like to try to solve this before we migrate to CCCL 3.0, just to iron out any other problems with the devendoring. It’s okay if we update to 3.0 first, either way is fine. |
OK makes sense. Now that we've started the CUDA 11 drops we'll need to make sure that we remember to debug this issue since we won't observe it in CI anymore. In the meantime scoping this PR to only focus on NVTX seems sensible. Maybe open an issue for the CCCL include order bit so that we don't forget? |
As I think more about this, I see some tension. We would prefer a hard pinning on Some options:
|
Might not work well for CCCL or won't work well for CCCL? This seems more straightforward than splitting the package (but if we'll have to end up splitting it in the near-future anyway, we might as well bite the bullet) |
An unpinned CCCL would make building and testing too difficult. We need CCCL pinned at build time. But it’s hard to have pinned dependencies for packages that are mostly header-only. Basically you have to split to a “-dev” package or you have to tell the consumer to supply their own NVTX and CCCL. That is also an okay outcome with me. Maybe we should start there - pin at build time and add a run dependency that is unpinned. If consumers have struggles with building downstream they can add their own constraints. |
That seems like a reasonable division of labor. Installing packages that use |
Is it possible/realistic to move all uses of nvtx out of rmm headers and into implementation files (as part of the move to rmm as a shared library)? Historically the reason that we have needed to export a dependency on nvtx is that rmm is header-only and so the nvtx usage in headers is effectively "public", but since all we're using it for is ranges in theory we should be able to move all that logic into source files and isolate the dependency from consumers altogether. That doesn't solve the problem for CCCL since we actually pass CCCL types across ABI boundaries. That's a problem for another day. |
I think that’s worth exploring. There are sizable parts that are headers because they’re templated, so I’d want to explore the device_uvector and other templated types’ usage of NVTX first. |
Agreed, the templating makes this challenging. I don't know if it is feasible to completely avoid exporting some NVTX dependency to consumers, but I would like us to explore how far we can go with that. |
Description
Addresses part of #1833.
Checklist