FIX Propagate random state to numpy rng in `make_classification` #6518

betatim · 2025-04-08T07:24:51Z

When generating a classification problem a mix of cupy and numpy random generators is used. The random state needs to be propagated to the numpy generator which was not happening.

I'm not sure what the best way of doing the "propagation" is. There is prior art for using randint and then using that as a seed. We don't need the sequences to be the same, we just need a way to initialise the Numpy random generator.

An alternative would be to use the cupy rng throughout and combine and it with moving the data to a numpy array.

closes #6510

copy-pr-bot · 2025-04-08T07:24:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

betatim · 2025-04-08T07:37:06Z

/ok to test

When generating a classification problem a mix of cupy and numpy random generators is used. The random state needs to be propagated to the numpy generator.

betatim · 2025-04-08T07:48:37Z

(Force pushed a few times until I figure out why my commits weren't signed)

viclafargue

LGTM, just small comment

python/cuml/cuml/datasets/classification.py

betatim · 2025-04-09T14:13:04Z

Ok, I'll leave it like this for now.

This is ready for re-running the CI (when the cudf problems have been solved) and then merge.

New random numbers cange what is expected

csadorf

Do you have any hypothesis as to why we are using a mix of numpy and cupy in the first place? Why not use cupy also to generate the hypercube?

Mark the following tests as flaky in cuml.accel test suite since they are showing flaky behavior: - sklearn.cluster.tests.test_k_means::test_score_max_iter[42-KMeans] - sklearn.manifold.tests.test_t_sne::test_optimization_minimizes_kl_divergence Spun off from #6518 for immediate merge. Authors: - Simon Adorf (https://github.com/csadorf) - Tim Head (https://github.com/betatim) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: #6598

csadorf

I have one question, but could be addressed in a follow-up.

betatim · 2025-04-29T08:44:26Z

/merge

betatim · 2025-04-29T08:50:59Z

Do you have any hypothesis as to why we are using a mix of numpy and cupy in the first place? Why not use cupy also to generate the hypercube?

Good question, from looking at increasingly old "git blame"s I landed at d0a0e72 which seems to be when this was added. It already used Numpy back then. I couldn't work out how to find the PR related to this commit (I totally rely on the PR number being in the commit :(), but maybe if we can find the PR we can see if there was any discussion?

My suspicion is that something that _generate_hypercube needs is missing from cupy or would be tricky to implement (maybe sample_without_replacement?)

csadorf · 2025-04-29T13:45:25Z

Ok, I'm happy to let that be for now. Once we start more rigorous profiling we can identify the low hanging fruit for optimization like that.

Mark the following tests as flaky in cuml.accel test suite since they are showing flaky behavior: - sklearn.cluster.tests.test_k_means::test_score_max_iter[42-KMeans] - sklearn.manifold.tests.test_t_sne::test_optimization_minimizes_kl_divergence Spun off from rapidsai#6518 for immediate merge. Authors: - Simon Adorf (https://github.com/csadorf) - Tim Head (https://github.com/betatim) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: rapidsai#6598

…idsai#6518) When generating a classification problem a mix of cupy and numpy random generators is used. The random state needs to be propagated to the numpy generator which was not happening. I'm not sure what the best way of doing the "propagation" is. There is prior art for using `randint` and then using that as a seed. We don't need the sequences to be the same, we just need a way to initialise the Numpy random generator. An alternative would be to use the cupy rng throughout and combine and it with moving the data to a numpy array. closes rapidsai#6510 Authors: - Tim Head (https://github.com/betatim) - Simon Adorf (https://github.com/csadorf) Approvers: - Simon Adorf (https://github.com/csadorf) URL: rapidsai#6518

betatim requested a review from a team as a code owner April 8, 2025 07:24

betatim requested review from vyasr and jcrist April 8, 2025 07:24

github-actions bot added the Cython / Python Cython or Python issue label Apr 8, 2025

betatim added the bug Something isn't working label Apr 8, 2025

betatim force-pushed the fix-make_classification-random_state branch 2 times, most recently from ff50d69 to 00eb330 Compare April 8, 2025 07:36

Propagate random state to numpy rng

94bfe15

When generating a classification problem a mix of cupy and numpy random generators is used. The random state needs to be propagated to the numpy generator.

betatim force-pushed the fix-make_classification-random_state branch from 00eb330 to 94bfe15 Compare April 8, 2025 07:48

viclafargue reviewed Apr 8, 2025

View reviewed changes

python/cuml/cuml/datasets/classification.py Outdated Show resolved Hide resolved

Use existing helper

41f9217

betatim added the breaking Breaking change label Apr 8, 2025

viclafargue reviewed Apr 8, 2025

View reviewed changes

python/cuml/cuml/datasets/classification.py Outdated Show resolved Hide resolved

Use safe import

1404688

Merge branch 'branch-25.06' into fix-make_classification-random_state

dccf46b

betatim force-pushed the fix-make_classification-random_state branch from 360d603 to dccf46b Compare April 28, 2025 07:47

betatim added 6 commits April 28, 2025 10:13

No more conditional imports

b8e0fa1

Fix expectation for tests

2b60ea1

New random numbers cange what is expected

Different random numbers

1436146

Need to use a numpy random state

29d5f9c

Update xfailed list for cuml.accel tests

6f9f038

Mark test as flakey, not as passing

bb55d36

csadorf reviewed Apr 28, 2025

View reviewed changes

csadorf mentioned this pull request Apr 28, 2025

Mark one kmeans and one t_sne sklearn test as flaky #6598

Merged

csadorf approved these changes Apr 28, 2025

View reviewed changes

Merge branch 'branch-25.06' into fix-make_classification-random_state

dbe3c3c

rapids-bot bot merged commit 5add36e into rapidsai:branch-25.06 Apr 29, 2025
74 checks passed

betatim deleted the fix-make_classification-random_state branch April 29, 2025 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX Propagate random state to numpy rng in `make_classification` #6518

FIX Propagate random state to numpy rng in `make_classification` #6518

Uh oh!

betatim commented Apr 8, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 8, 2025

Uh oh!

betatim commented Apr 8, 2025

Uh oh!

betatim commented Apr 8, 2025

Uh oh!

viclafargue left a comment

Uh oh!

Uh oh!

Uh oh!

betatim commented Apr 9, 2025

Uh oh!

csadorf left a comment

Uh oh!

csadorf left a comment

Uh oh!

betatim commented Apr 29, 2025

Uh oh!

Uh oh!

betatim commented Apr 29, 2025

Uh oh!

csadorf commented Apr 29, 2025

Uh oh!

Uh oh!

FIX Propagate random state to numpy rng in make_classification #6518

FIX Propagate random state to numpy rng in make_classification #6518

Uh oh!

Conversation

betatim commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Apr 8, 2025

Uh oh!

betatim commented Apr 8, 2025

Uh oh!

betatim commented Apr 8, 2025

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

betatim commented Apr 9, 2025

Uh oh!

csadorf left a comment

Choose a reason for hiding this comment

Uh oh!

csadorf left a comment

Choose a reason for hiding this comment

Uh oh!

betatim commented Apr 29, 2025

Uh oh!

Uh oh!

betatim commented Apr 29, 2025

Uh oh!

csadorf commented Apr 29, 2025

Uh oh!

Uh oh!

FIX Propagate random state to numpy rng in `make_classification` #6518

FIX Propagate random state to numpy rng in `make_classification` #6518

betatim commented Apr 8, 2025 •

edited

Loading