Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! #9

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

LucaCappelletti94
Copy link
Contributor

I am running a lot of wilcoxon tests, so I need it to be as fast as possible. I got a small speed improvement so far, with the original performance being:

Wilcoxon signed-rank test
      time:   [2.6232 ms 2.6244 ms 2.6259 ms]

and the updated performance now up to:

Wilcoxon signed-rank test
      time:   [2.3477 ms 2.3498 ms 2.3522 ms]

So about a solid 10% speedup. Not much, but it is something of note.

@LucaCappelletti94
Copy link
Contributor Author

FYI, in my practical use case this change reduced the effective wall time from 2:20 hours to 1:27 hours, so more than 10%.

@LucaCappelletti94
Copy link
Contributor Author

I have obtained some more improvements, now the benchmark show:

Wilcoxon signed-rank test
          time:   [2.2546 ms 2.2556 ms 2.2570 ms]

Which is an overall speedup compared to the original 14%. In my use case, which has much larger sample sizes than those I employ in this benchmark, the speedup is massive - from the initial 2:20 hours, it stands now at just 12 minutes!

@LucaCappelletti94 LucaCappelletti94 changed the title Slightly improved speed of wilcoxon signed rank Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! Aug 18, 2024
@LucaCappelletti94
Copy link
Contributor Author

The command to run the bench is RUSTFLAGS='-C target-cpu=native' cargo bench --bench wilcoxon.

Having now increased the size of the array, the original version performs as follows:

Wilcoxon signed-rank test
       time:   [85.022 ms 85.069 ms 85.130 ms]

While the improved version achieves the following:

Wilcoxon signed-rank test
        time:   [48.109 ms 48.121 ms 48.136 ms]

I believe the larger the sample array, the more significant the time improvement, and in my case there are several millions of samples.

@larsgw
Copy link
Owner

larsgw commented Aug 20, 2024

Thank you so much, again! I have not worked on this in three years, and have not used a lot of Rust since, so I might need a bit more time to get back up to speed.

@LucaCappelletti94
Copy link
Contributor Author

Not to worry, your code is understandable! Otherwise, I could not have optimized it so quickly.

@LucaCappelletti94
Copy link
Contributor Author

New updates - I have added the possibility of using f32 and signed integers and the optional feature of voracious sorting. I will now be adding methods allowing for quantization. These are the new performances:

Test Suite Time (ms) Outliers (%) Outlier Breakdown
sort_unstable_f64 56.705 14.00% 2 high mild, 12 high severe
voracious_f64 38.327 15.00% 10 high mild, 5 high severe
sort_unstable_f32 56.685 13.00% 1 high mild, 12 high severe
voracious_f32 28.051 4.00% 2 high mild, 2 high severe
sort_unstable_i64 38.761 8.00% 5 high mild, 3 high severe
voracious_i64 38.123 13.00% 4 high mild, 9 high severe
sort_unstable_i32 37.821 10.00% 7 high mild, 3 high severe
voracious_i32 24.343 9.00% 5 high mild, 4 high severe
sort_unstable_i16 51.988 29.00% 3 low severe, 18 low mild, 2 high mild, 6 high severe
voracious_i16 17.262 3.00% 2 high mild, 1 high severe
sort_unstable_i8 27.630 16.00% 3 high mild, 13 high severe
voracious_i8 10.659 1.00% 1 high mild

@LucaCappelletti94
Copy link
Contributor Author

I have now introduced the quantized variant, which allows a user to specify a signed integer target for quantization of the deltas of the Wilcoxon test. This can be done as the only thing the Wilcoxon cares about is not the absolute value, but the relative absolute value, so any division by a positive scalar to the deltas does not change the results. Of course, switching to integers allows for significantly faster performance in sorting - here are the benchmarks:

Test Suite Time (ms) Outliers (%) Outlier Breakdown
sort_unstable_f32 56.353 8.00% 8 high severe
voracious_f32 31.818 5.00% 4 high mild, 1 high severe
quantized_sort_unstable_f32_to_i8 28.876 10.00% 2 high mild, 8 high severe
quantized_voracious_f32_to_i8 12.791 3.00% 3 low mild
quantized_sort_unstable_f32_to_i16 53.360 13.00% 3 high mild, 10 high severe
quantized_voracious_f32_to_i16 18.349 3.00% 2 high mild, 1 high severe
sort_unstable_f64 55.714 16.00% 5 high mild, 11 high severe
voracious_f64 38.008 10.00% 7 high mild, 3 high severe
quantized_sort_unstable_f64_to_i8 28.964 9.00% 2 high mild, 7 high severe
quantized_voracious_f64_to_i8 13.601 2.00% 1 high mild, 1 high severe
quantized_sort_unstable_f64_to_i16 53.457 10.00% 4 high mild, 6 high severe
quantized_voracious_f64_to_i16 18.639 5.00% 2 high mild, 3 high severe
quantized_sort_unstable_f64_to_i32 39.384 9.00% 4 high mild, 5 high severe
quantized_voracious_f64_to_i32 29.309 7.00% 3 high mild, 4 high severe
sort_unstable_i64 38.498 10.00% 10 high mild
voracious_i64 36.825 10.00% 6 high mild, 4 high severe
sort_unstable_i32 36.873 12.00% 6 high mild, 6 high severe
voracious_i32 24.377 14.00% 2 low severe, 6 low mild, 5 high mild, 1 high severe
sort_unstable_i16 49.385 16.00% 8 high mild, 8 high severe
voracious_i16 15.154 1.00% 1 high mild
sort_unstable_i8 24.917 19.00% 8 high mild, 11 high severe
voracious_i8 10.538 7.00% 1 low severe, 6 high mild

@LucaCappelletti94
Copy link
Contributor Author

So, the initial improvement brought the Wilcoxon test part of my benchmark from 2 hours to 12 minutes, which allowed me to increase the number of tests to consider. After having added the new tests, the time requirements ballooned to around 5 hours at least, I didn't let it finish. Now, the run time seems to be around 2 hours again, so this means the improvement from the starting time requirements should be around 50x!

Copy link

@Licenser Licenser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing in a few thoughts :)

src/traits/one.rs Show resolved Hide resolved
src/traits/zero.rs Show resolved Hide resolved
src/test/wilcoxon_w.rs Show resolved Hide resolved
src/statistics/ranks.rs Show resolved Hide resolved
src/statistics/ranks.rs Show resolved Hide resolved
@LucaCappelletti94
Copy link
Contributor Author

@larsgw do checkout this PR when you have time, I think this is as good as it will get performance-wise.

This was referenced Sep 23, 2024
@larsgw
Copy link
Owner

larsgw commented Oct 1, 2024

I am still very thankful, but I simply do not have the time to look at this in the next weeks, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants