Avoid vendoring NVTX #1930

bdice · 2025-05-28T13:04:24Z

Description

Addresses part of #1833.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

bdice · 2025-05-28T21:49:51Z

I believe the failures on CUDA 11 are indicative of a problem in include paths. The failures look like this:

│ │ sccache $BUILD_PREFIX/bin/nvcc -forward-unknown-to-host-compiler -DCUB_DISABLE_NAMESPACE_MAGIC -DCUB_IGNORE_NAMESPACE_MAGIC_ERROR -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRMM_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_INFO -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -I$SRC_DIR/cpp -I$SRC_DIR/cpp/include -I$SRC_DIR/cpp/build/include -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googlemock/include -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googlemock -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googletest/include -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googletest -isystem /usr/local/cuda/targets/x86_64-linux/include -isystem $PREFIX/include -isystem $BUILD_PREFIX/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIE "-Xcompiler=-ffile-prefix-map=$PREFIX/=''" -MD -MT tests/CMakeFiles/THRUST_ALLOCATOR_TEST.dir/mr/device/thrust_allocator_tests.cu.o -MF tests/CMakeFiles/THRUST_ALLOCATOR_TEST.dir/mr/device/thrust_allocator_tests.cu.o.d -x cu -c $SRC_DIR/cpp/tests/mr/device/thrust_allocator_tests.cu -o tests/CMakeFiles/THRUST_ALLOCATOR_TEST.dir/mr/device/thrust_allocator_tests.cu.o
 │ │ In file included from $PREFIX/include/cuda/stream_ref:53,
 │ │                  from $SRC_DIR/cpp/include/rmm/cuda_stream_view.hpp:21,
 │ │                  from $SRC_DIR/cpp/include/rmm/mr/device/system_memory_resource.hpp:19,
 │ │                  from $SRC_DIR/cpp/tests/mr/device/test_utils.hpp:20,
 │ │                  from $SRC_DIR/cpp/tests/mr/device/mr_ref_test.hpp:20,
 │ │                  from $SRC_DIR/cpp/tests/mr/device/thrust_allocator_tests.cu:17:
 │ │ $PREFIX/include/cuda/std/__cuda/api_wrapper.h:24:24: error: missing binary operator before token "("
 │ │    24 | #if _CCCL_CUDA_COMPILER(CLANG)
 │ │       |                        ^

but it's caused by the build finding the wrong CCCL headers, probably due to the ordering here:

-isystem /usr/local/cuda/targets/x86_64-linux/include -isystem $PREFIX/include

I'll need to investigate how to make the conda prefix include/ go before the conda CUDA packages' include/ if we want this to discover the CCCL conda package that we installed.

vyasr · 2025-05-29T01:03:04Z

I'm fine doing this if we can get it working now. I forgot the plan from NVIDIA/cccl#1939 (comment) when I opened rapidsai/build-planning#183, where I proposed waiting to do cccl until we moved to 3.0. Up to you, but I sort of anticipated things being easier if we waited to do this until we dropped cuda 11 (which I think you are observing now with the build failures). We could reduce the scope of this pr to just nvtx if you prefer.

bdice · 2025-05-29T01:12:04Z

@vyasr Yes, I think scoping to just NVTX is a good start.

The problem is that the failure on CUDA 11 is actually just an indication that we have the wrong order of include directories. The same problem applies to CUDA 12, it just happens that the CUDA 12 included CCCL is closer to the CCCL (conda package) that we actually want, so it doesn’t fail to build. We are still getting a mix of CCCL versions.

I would like to try to solve this before we migrate to CCCL 3.0, just to iron out any other problems with the devendoring. It’s okay if we update to 3.0 first, either way is fine.

vyasr · 2025-05-30T17:24:06Z

OK makes sense. Now that we've started the CUDA 11 drops we'll need to make sure that we remember to debug this issue since we won't observe it in CI anymore. In the meantime scoping this PR to only focus on NVTX seems sensible. Maybe open an issue for the CCCL include order bit so that we don't forget?

bdice · 2025-06-25T14:46:21Z

As I think more about this, I see some tension. We would prefer a hard pinning on nvtx-c so that we match exactly what is in rapids-cmake's versions.json. However, with the way that librmm is used today, it is a dependency of all of RAPIDS. That would make it impossible for someone to use a different nvtx-c pinning in their environment if librmm pins it. We have a few ways we could solve this -- more ideas would be helpful.

Some options:

Loosen the pinning and allow it to float forward (nvtx-c >=3.2.0)
- I think rapids-cmake will accept newer versions if found at build time.
- This might be okay for NVTX but might not work well for CCCL. A consistent solution would be ideal.
Separate librmm into a -dev package and a runtime package (Split RAPIDS C++ conda libraries into standardized components build-planning#46)

gforsyth · 2025-06-25T14:55:01Z

This might be okay for NVTX but might not work well for CCCL. A consistent solution would be ideal.

Might not work well for CCCL or won't work well for CCCL? This seems more straightforward than splitting the package (but if we'll have to end up splitting it in the near-future anyway, we might as well bite the bullet)

bdice · 2025-06-25T15:32:39Z

An unpinned CCCL would make building and testing too difficult. We need CCCL pinned at build time. But it’s hard to have pinned dependencies for packages that are mostly header-only. Basically you have to split to a “-dev” package or you have to tell the consumer to supply their own NVTX and CCCL. That is also an okay outcome with me. Maybe we should start there - pin at build time and add a run dependency that is unpinned. If consumers have struggles with building downstream they can add their own constraints.

conda/recipes/librmm/recipe.yaml

gforsyth · 2025-06-25T17:42:16Z

Maybe we should start there - pin at build time and add a run dependency that is unpinned. If consumers have struggles with building downstream they can add their own constraints.

That seems like a reasonable division of labor. Installing packages that use rmm doesn't unnecessarily constrain the environment, while slightly more care is required to use it as a build dependency.

vyasr · 2025-06-25T21:46:28Z

Is it possible/realistic to move all uses of nvtx out of rmm headers and into implementation files (as part of the move to rmm as a shared library)? Historically the reason that we have needed to export a dependency on nvtx is that rmm is header-only and so the nvtx usage in headers is effectively "public", but since all we're using it for is ranges in theory we should be able to move all that logic into source files and isolate the dependency from consumers altogether.

That doesn't solve the problem for CCCL since we actually pass CCCL types across ABI boundaries. That's a problem for another day.

bdice · 2025-06-25T22:56:21Z

I think that’s worth exploring. There are sizable parts that are headers because they’re templated, so I’d want to explore the device_uvector and other templated types’ usage of NVTX first.

vyasr · 2025-06-30T17:54:29Z

Agreed, the templating makes this challenging. I don't know if it is feasible to completely avoid exporting some NVTX dependency to consumers, but I would like us to explore how far we can go with that.

…ependencies

Avoid vendoring NVTX and CCCL.

5a53686

bdice requested a review from a team as a code owner May 28, 2025 13:04

bdice requested a review from gforsyth May 28, 2025 13:04

github-project-automation bot added this to RMM Project Board May 28, 2025

github-actions bot added the conda label May 28, 2025

Add to dependencies.yaml and run dependencies of librmm.

e447179

Only stop vendoring NVTX, leave CCCL as-is for now

c0c7dca

bdice changed the title ~~Avoid vendoring NVTX and CCCL.~~ Avoid vendoring NVTX May 30, 2025

bdice mentioned this pull request May 30, 2025

[BUG] librmm conda packages are shipping dependencies in the package #1833

Open

bdice added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels May 30, 2025

bdice self-assigned this May 30, 2025

Merge branch 'branch-25.08' into no-vendor-dependencies

ba394da

bdice mentioned this pull request Jun 12, 2025

Proposal: Stop cccl include paths conflicting with cuda-cccl when used with nvcc conda-forge/cccl-feedstock#30

Closed

1 task

bdice added 2 commits June 18, 2025 12:24

Merge branch 'main' into no-vendor-dependencies

441dff5

Use nvtx-c

c81cf86

bdice commented Jun 25, 2025

View reviewed changes

conda/recipes/librmm/recipe.yaml Outdated Show resolved Hide resolved

Update conda/recipes/librmm/recipe.yaml

bd75b49

Merge remote-tracking branch 'upstream/branch-25.08' into no-vendor-d…

0dbc486

…ependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid vendoring NVTX #1930

Avoid vendoring NVTX #1930

bdice commented May 28, 2025 •

edited

Loading

Uh oh!

bdice commented May 28, 2025

Uh oh!

vyasr commented May 29, 2025 •

edited

Loading

Uh oh!

bdice commented May 29, 2025 •

edited

Loading

Uh oh!

vyasr commented May 30, 2025

Uh oh!

bdice commented Jun 25, 2025 •

edited

Loading

Uh oh!

gforsyth commented Jun 25, 2025

Uh oh!

bdice commented Jun 25, 2025

Uh oh!

Uh oh!

gforsyth commented Jun 25, 2025

Uh oh!

vyasr commented Jun 25, 2025

Uh oh!

bdice commented Jun 25, 2025

Uh oh!

vyasr commented Jun 30, 2025

Uh oh!

Uh oh!

Avoid vendoring NVTX #1930

Are you sure you want to change the base?

Avoid vendoring NVTX #1930

Conversation

bdice commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

bdice commented May 28, 2025

Uh oh!

vyasr commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdice commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr commented May 30, 2025

Uh oh!

bdice commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gforsyth commented Jun 25, 2025

Uh oh!

bdice commented Jun 25, 2025

Uh oh!

Uh oh!

gforsyth commented Jun 25, 2025

Uh oh!

vyasr commented Jun 25, 2025

Uh oh!

bdice commented Jun 25, 2025

Uh oh!

vyasr commented Jun 30, 2025

Uh oh!

Uh oh!

bdice commented May 28, 2025 •

edited

Loading

vyasr commented May 29, 2025 •

edited

Loading

bdice commented May 29, 2025 •

edited

Loading

bdice commented Jun 25, 2025 •

edited

Loading