Skip to content

Conversation

bdice
Copy link
Contributor

@bdice bdice commented May 28, 2025

Description

Addresses part of #1833.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@bdice bdice requested a review from a team as a code owner May 28, 2025 13:04
@bdice bdice requested a review from gforsyth May 28, 2025 13:04
@github-actions github-actions bot added the conda label May 28, 2025
@bdice
Copy link
Contributor Author

bdice commented May 28, 2025

I believe the failures on CUDA 11 are indicative of a problem in include paths. The failures look like this:

│ │ sccache $BUILD_PREFIX/bin/nvcc -forward-unknown-to-host-compiler -DCUB_DISABLE_NAMESPACE_MAGIC -DCUB_IGNORE_NAMESPACE_MAGIC_ERROR -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRMM_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_INFO -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -I$SRC_DIR/cpp -I$SRC_DIR/cpp/include -I$SRC_DIR/cpp/build/include -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googlemock/include -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googlemock -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googletest/include -isystem $SRC_DIR/cpp/build/_deps/gtest-src/googletest -isystem /usr/local/cuda/targets/x86_64-linux/include -isystem $PREFIX/include -isystem $BUILD_PREFIX/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIE "-Xcompiler=-ffile-prefix-map=$PREFIX/=''" -MD -MT tests/CMakeFiles/THRUST_ALLOCATOR_TEST.dir/mr/device/thrust_allocator_tests.cu.o -MF tests/CMakeFiles/THRUST_ALLOCATOR_TEST.dir/mr/device/thrust_allocator_tests.cu.o.d -x cu -c $SRC_DIR/cpp/tests/mr/device/thrust_allocator_tests.cu -o tests/CMakeFiles/THRUST_ALLOCATOR_TEST.dir/mr/device/thrust_allocator_tests.cu.o
 │ │ In file included from $PREFIX/include/cuda/stream_ref:53,
 │ │                  from $SRC_DIR/cpp/include/rmm/cuda_stream_view.hpp:21,
 │ │                  from $SRC_DIR/cpp/include/rmm/mr/device/system_memory_resource.hpp:19,
 │ │                  from $SRC_DIR/cpp/tests/mr/device/test_utils.hpp:20,
 │ │                  from $SRC_DIR/cpp/tests/mr/device/mr_ref_test.hpp:20,
 │ │                  from $SRC_DIR/cpp/tests/mr/device/thrust_allocator_tests.cu:17:
 │ │ $PREFIX/include/cuda/std/__cuda/api_wrapper.h:24:24: error: missing binary operator before token "("
 │ │    24 | #if _CCCL_CUDA_COMPILER(CLANG)
 │ │       |                        ^

but it's caused by the build finding the wrong CCCL headers, probably due to the ordering here:

-isystem /usr/local/cuda/targets/x86_64-linux/include -isystem $PREFIX/include

I'll need to investigate how to make the conda prefix include/ go before the conda CUDA packages' include/ if we want this to discover the CCCL conda package that we installed.

@vyasr
Copy link
Contributor

vyasr commented May 29, 2025

I'm fine doing this if we can get it working now. I forgot the plan from NVIDIA/cccl#1939 (comment) when I opened rapidsai/build-planning#183, where I proposed waiting to do cccl until we moved to 3.0. Up to you, but I sort of anticipated things being easier if we waited to do this until we dropped cuda 11 (which I think you are observing now with the build failures). We could reduce the scope of this pr to just nvtx if you prefer.

@bdice
Copy link
Contributor Author

bdice commented May 29, 2025

@vyasr Yes, I think scoping to just NVTX is a good start.

The problem is that the failure on CUDA 11 is actually just an indication that we have the wrong order of include directories. The same problem applies to CUDA 12, it just happens that the CUDA 12 included CCCL is closer to the CCCL (conda package) that we actually want, so it doesn’t fail to build. We are still getting a mix of CCCL versions.

I would like to try to solve this before we migrate to CCCL 3.0, just to iron out any other problems with the devendoring. It’s okay if we update to 3.0 first, either way is fine.

@vyasr
Copy link
Contributor

vyasr commented May 30, 2025

OK makes sense. Now that we've started the CUDA 11 drops we'll need to make sure that we remember to debug this issue since we won't observe it in CI anymore. In the meantime scoping this PR to only focus on NVTX seems sensible. Maybe open an issue for the CCCL include order bit so that we don't forget?

@bdice bdice changed the title Avoid vendoring NVTX and CCCL. Avoid vendoring NVTX May 30, 2025
@bdice bdice added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels May 30, 2025
@bdice bdice self-assigned this May 30, 2025
@bdice
Copy link
Contributor Author

bdice commented Jun 25, 2025

As I think more about this, I see some tension. We would prefer a hard pinning on nvtx-c so that we match exactly what is in rapids-cmake's versions.json. However, with the way that librmm is used today, it is a dependency of all of RAPIDS. That would make it impossible for someone to use a different nvtx-c pinning in their environment if librmm pins it. We have a few ways we could solve this -- more ideas would be helpful.

Some options:

@gforsyth
Copy link
Contributor

  • This might be okay for NVTX but might not work well for CCCL. A consistent solution would be ideal.

Might not work well for CCCL or won't work well for CCCL? This seems more straightforward than splitting the package (but if we'll have to end up splitting it in the near-future anyway, we might as well bite the bullet)

@bdice
Copy link
Contributor Author

bdice commented Jun 25, 2025

An unpinned CCCL would make building and testing too difficult. We need CCCL pinned at build time. But it’s hard to have pinned dependencies for packages that are mostly header-only. Basically you have to split to a “-dev” package or you have to tell the consumer to supply their own NVTX and CCCL. That is also an okay outcome with me. Maybe we should start there - pin at build time and add a run dependency that is unpinned. If consumers have struggles with building downstream they can add their own constraints.

@gforsyth
Copy link
Contributor

Maybe we should start there - pin at build time and add a run dependency that is unpinned. If consumers have struggles with building downstream they can add their own constraints.

That seems like a reasonable division of labor. Installing packages that use rmm doesn't unnecessarily constrain the environment, while slightly more care is required to use it as a build dependency.

@vyasr
Copy link
Contributor

vyasr commented Jun 25, 2025

Is it possible/realistic to move all uses of nvtx out of rmm headers and into implementation files (as part of the move to rmm as a shared library)? Historically the reason that we have needed to export a dependency on nvtx is that rmm is header-only and so the nvtx usage in headers is effectively "public", but since all we're using it for is ranges in theory we should be able to move all that logic into source files and isolate the dependency from consumers altogether.

That doesn't solve the problem for CCCL since we actually pass CCCL types across ABI boundaries. That's a problem for another day.

@bdice
Copy link
Contributor Author

bdice commented Jun 25, 2025

I think that’s worth exploring. There are sizable parts that are headers because they’re templated, so I’d want to explore the device_uvector and other templated types’ usage of NVTX first.

@vyasr
Copy link
Contributor

vyasr commented Jun 30, 2025

Agreed, the templating makes this challenging. I don't know if it is feasible to completely avoid exporting some NVTX dependency to consumers, but I would like us to explore how far we can go with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants