Skip to content

Conversation

AyodeAwe
Copy link
Contributor

❄️ Code freeze for branch-25.08 and v25.08 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.08 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-25.08 into main for the release

raydouglass and others added 30 commits April 30, 2025 15:08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Contributes to rapidsai/build-planning#181

* removes all uploads of conda packages and wheels to `downloads.rapids.ai`

## Notes for Reviewers

### How I identified changes

Looked for uses of the relevant `gha-tools` tools, as well as documentation about `downloads.rapids.ai`, being on the NVIDIA VPN, using S3, etc. like this:

```shell
git grep -i -E 's3|upload|downloads\.rapids|vpn'
```

### How I tested this

See "How I tested this" on rapidsai/shared-workflows#364

#

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1929
This PR removes CUDA 11 devcontainers and updates CI scripts.

xref: rapidsai/build-planning#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1933
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Temporarily skipping a test to unblock CI. Failures on HMM systems are
being diagnosed in #1935 and #1944.
The `REPLAY_BENCH` benchmark is used for replaying logs of allocation patterns.

However, unless one passes `--benchmark_repetitions=1` and `--benchmark_min_time=0s`, the replay hangs forever after the first warmup iteration. The problem is that a shared `event_index` needs to be reset for each benchmark iteration.

Additionally, if running a multi-threaded allocation replay, there is a race condition between thread 0 setting up and tearing down the memory resource being used, and any other threads running through their allocation pattern.

To fix these, now that we no longer support CUDA 11, require C++20 to compile the benchmarks and use a `std::barrier` to ensure ordering between `SetUp`/`TearDown` on thread 0 and the actual benchmark iteration.

To handle the `event_index` problem, we again use barriers for sequencing. Thread 0 resets the `event_index` at the beginning of each benchmark iteration and then everyone waits at a barrier.

- Closes #1939

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Rong Ou (https://github.com/rongou)
  - Bradley Dice (https://github.com/bdice)

URL: #1940
Minor documentation fix for the Python package path.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1936
Extend replay benchmark to include managed memory resource.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1938
`rmm` is piloting the new branching strategy
(https://docs.rapids.ai/notices/rsn0047/). This PR updates the branches
which trigger a nightly/branch build in the new strategy.
This adds an env var to pass the github token through to the telemetry summary shared action. The token is necessary to check if the base artifact exists. See rapidsai/shared-actions#56 for more information.

The whitespace changes here were introduced from using yq with rapids-reviser to add this field. If the whitespace changes are undesirable, I will revert them.

Authors:
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1956
This pull request implements the reverse iterators for the `device_uvector` type.
Closes #1326

Authors:
  - Basit Ayantunde (https://github.com/lamarrr)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1937
Updates vendored cxxopts.hpp to 3.3.1.

xref: #1951 (comment)

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)
  - Mark Harris (https://github.com/harrism)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1962
Make RMM easier to BUILD with LLVM:
- add missing #includes;
- do not require nvtx headers when its configured not to use nvtx.

closes #1948 .

Authors:
  - https://github.com/vitor1001
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Mark Harris (https://github.com/harrism)

URL: #1951
Erasing from `allocations_` invalidates all iterators, so dereferencing `found` is Undefined Behaviour. In practice, we're seeing completely messed up tracking.

closes #1965

Authors:
  - Clement Courbet (https://github.com/legrosbuffle)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1966
This is Undefined Behaviour.

closes #1967

Authors:
  - Clement Courbet (https://github.com/legrosbuffle)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Rong Ou (https://github.com/rongou)

URL: #1968
jameslamb and others added 12 commits June 27, 2025 20:21
Testing the changes from rapidsai/gha-tools#196, which contribute to rapidsai/shared-workflows#377

I'm proposing that we **merge this as-is**, to test that these changes work in the following situations on `main`:

* `branch` build triggered by a merge
* manually-triggered `nightly` test run

Then merge a follow-up PR reverting all of this, after rapidsai/gha-tools#196 is merged.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #1972
Reverts #1972, which was just merged to test the changes from rapidsai/gha-tools#196 for `branch` / `nightly` builds.

Created like this:

```shell
git revert d07133f
```

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #1974
Use CUDA 12.9 throughout different build and test environments.

Authors:
  - https://github.com/jakirkham

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: #1973
Contributes to rapidsai/shared-workflows#376

* adds descriptions for all inputs to workflows triggered by `workflow_dispatch`

## Notes for Reviewers

### Motivation

The input descriptions show up in the UI when you go to trigger these workflows. Like this:

![image](https://github.com/user-attachments/assets/fc62d1ff-39eb-47c7-9a21-57aab959e64f)

I'm hoping that will make it easier for developers to manually trigger workflows. Inspired by being asked multiple times "what format is `date` supposed to be in?".

#

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1975
Continues from #1896.

Contributes to #1779.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Rong Ou (https://github.com/rongou)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #1980
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.4 → v0.12.2](astral-sh/ruff-pre-commit@v0.11.4...v0.12.2)
- [github.com/MarcoGorelli/cython-lint: v0.16.6 → v0.16.7](MarcoGorelli/cython-lint@v0.16.6...v0.16.7)
- [github.com/pre-commit/mirrors-clang-format: v20.1.4 → v20.1.7](pre-commit/mirrors-clang-format@v20.1.4...v20.1.7)
- [github.com/rapidsai/pre-commit-hooks: v0.6.0 → v0.7.0](rapidsai/pre-commit-hooks@v0.6.0...v0.7.0)
- [github.com/rapidsai/dependency-file-generator: v1.18.1 → v1.19.1](rapidsai/dependency-file-generator@v1.18.1...v1.19.1)

Authors:
  - https://github.com/apps/pre-commit-ci
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1979
I am not sure if this is the correct method to ensure librmm is built and found when building / installing the rmm Python package, but this has worked for me.

Fixes #1977

Authors:
  - Graham Markall (https://github.com/gmarkall)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1978
This PR updates RMM to require CUDA 12.0+. This drops version checks less than 12.0 and updates some enums to use values defined in CUDA 12.0+.

Closes #1745.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Yunsong Wang (https://github.com/PointKernel)
  - Shruti Shivakumar (https://github.com/shrshi)
  - Matthew Murray (https://github.com/Matt711)

URL: #1984
…1987)

In rapidsai/build-planning#187 we switched the docker image tagging scheme
over to include the CalVer information.  This was done to allow us to make
changes to the images during burndown without breaking release pipelines.

This PR moves all of the existing `latest` tags to the newer versioned tag
`25.08-latest` and also modifies the `update_version.sh` script to bump
that version at branch creation time.

xref: rapidsai/build-planning#187

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1987
Removing a line that does nothing from `update-version.sh`

xref #1987

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1989
This reverts commit db325e6.

We've rolled back most of the changes associated with the new branching model, but this also needs to be reverted.

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1988
Closes #1318.

This uses `device_uvector<T>::size_type` instead of hardcoding `std::size_t` in its implementation.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - David Wendt (https://github.com/davidwendt)

URL: #1992
@AyodeAwe AyodeAwe requested review from a team as code owners July 24, 2025 17:07
@AyodeAwe AyodeAwe requested review from bdice and removed request for a team July 24, 2025 17:07
Copy link

copy-pr-bot bot commented Jul 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@AyodeAwe AyodeAwe requested review from shrshi and removed request for a team July 24, 2025 17:07
@github-actions github-actions bot added CMake Python Related to RMM Python API conda ci labels Jul 24, 2025
@AyodeAwe AyodeAwe merged commit 9b3441c into main Aug 6, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake conda Python Related to RMM Python API
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.