Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! #9

LucaCappelletti94 · 2024-08-18T10:29:28Z

I am running a lot of wilcoxon tests, so I need it to be as fast as possible. I got a small speed improvement so far, with the original performance being:

Wilcoxon signed-rank test
      time:   [2.6232 ms 2.6244 ms 2.6259 ms]

and the updated performance now up to:

Wilcoxon signed-rank test
      time:   [2.3477 ms 2.3498 ms 2.3522 ms]

So about a solid 10% speedup. Not much, but it is something of note.

LucaCappelletti94 · 2024-08-18T10:35:12Z

FYI, in my practical use case this change reduced the effective wall time from 2:20 hours to 1:27 hours, so more than 10%.

LucaCappelletti94 · 2024-08-18T14:52:16Z

I have obtained some more improvements, now the benchmark show:

Wilcoxon signed-rank test
          time:   [2.2546 ms 2.2556 ms 2.2570 ms]

Which is an overall speedup compared to the original 14%. In my use case, which has much larger sample sizes than those I employ in this benchmark, the speedup is massive - from the initial 2:20 hours, it stands now at just 12 minutes!

LucaCappelletti94 · 2024-08-18T15:05:09Z

The command to run the bench is RUSTFLAGS='-C target-cpu=native' cargo bench --bench wilcoxon.

Having now increased the size of the array, the original version performs as follows:

Wilcoxon signed-rank test
       time:   [85.022 ms 85.069 ms 85.130 ms]

While the improved version achieves the following:

Wilcoxon signed-rank test
        time:   [48.109 ms 48.121 ms 48.136 ms]

I believe the larger the sample array, the more significant the time improvement, and in my case there are several millions of samples.

larsgw · 2024-08-20T10:16:38Z

Thank you so much, again! I have not worked on this in three years, and have not used a lot of Rust since, so I might need a bit more time to get back up to speed.

LucaCappelletti94 · 2024-08-20T10:19:07Z

Not to worry, your code is understandable! Otherwise, I could not have optimized it so quickly.

LucaCappelletti94 · 2024-08-23T07:31:09Z

New updates - I have added the possibility of using f32 and signed integers and the optional feature of voracious sorting. I will now be adding methods allowing for quantization. These are the new performances:

Test Suite	Time (ms)	Outliers (%)	Outlier Breakdown
sort_unstable_f64	56.705	14.00%	2 high mild, 12 high severe
voracious_f64	38.327	15.00%	10 high mild, 5 high severe
sort_unstable_f32	56.685	13.00%	1 high mild, 12 high severe
voracious_f32	28.051	4.00%	2 high mild, 2 high severe
sort_unstable_i64	38.761	8.00%	5 high mild, 3 high severe
voracious_i64	38.123	13.00%	4 high mild, 9 high severe
sort_unstable_i32	37.821	10.00%	7 high mild, 3 high severe
voracious_i32	24.343	9.00%	5 high mild, 4 high severe
sort_unstable_i16	51.988	29.00%	3 low severe, 18 low mild, 2 high mild, 6 high severe
voracious_i16	17.262	3.00%	2 high mild, 1 high severe
sort_unstable_i8	27.630	16.00%	3 high mild, 13 high severe
voracious_i8	10.659	1.00%	1 high mild

LucaCappelletti94 · 2024-08-23T10:51:33Z

I have now introduced the quantized variant, which allows a user to specify a signed integer target for quantization of the deltas of the Wilcoxon test. This can be done as the only thing the Wilcoxon cares about is not the absolute value, but the relative absolute value, so any division by a positive scalar to the deltas does not change the results. Of course, switching to integers allows for significantly faster performance in sorting - here are the benchmarks:

Test Suite	Time (ms)	Outliers (%)	Outlier Breakdown
sort_unstable_f32	56.353	8.00%	8 high severe
voracious_f32	31.818	5.00%	4 high mild, 1 high severe
quantized_sort_unstable_f32_to_i8	28.876	10.00%	2 high mild, 8 high severe
quantized_voracious_f32_to_i8	12.791	3.00%	3 low mild
quantized_sort_unstable_f32_to_i16	53.360	13.00%	3 high mild, 10 high severe
quantized_voracious_f32_to_i16	18.349	3.00%	2 high mild, 1 high severe
sort_unstable_f64	55.714	16.00%	5 high mild, 11 high severe
voracious_f64	38.008	10.00%	7 high mild, 3 high severe
quantized_sort_unstable_f64_to_i8	28.964	9.00%	2 high mild, 7 high severe
quantized_voracious_f64_to_i8	13.601	2.00%	1 high mild, 1 high severe
quantized_sort_unstable_f64_to_i16	53.457	10.00%	4 high mild, 6 high severe
quantized_voracious_f64_to_i16	18.639	5.00%	2 high mild, 3 high severe
quantized_sort_unstable_f64_to_i32	39.384	9.00%	4 high mild, 5 high severe
quantized_voracious_f64_to_i32	29.309	7.00%	3 high mild, 4 high severe
sort_unstable_i64	38.498	10.00%	10 high mild
voracious_i64	36.825	10.00%	6 high mild, 4 high severe
sort_unstable_i32	36.873	12.00%	6 high mild, 6 high severe
voracious_i32	24.377	14.00%	2 low severe, 6 low mild, 5 high mild, 1 high severe
sort_unstable_i16	49.385	16.00%	8 high mild, 8 high severe
voracious_i16	15.154	1.00%	1 high mild
sort_unstable_i8	24.917	19.00%	8 high mild, 11 high severe
voracious_i8	10.538	7.00%	1 low severe, 6 high mild

LucaCappelletti94 · 2024-08-23T11:23:01Z

So, the initial improvement brought the Wilcoxon test part of my benchmark from 2 hours to 12 minutes, which allowed me to increase the number of tests to consider. After having added the new tests, the time requirements ballooned to around 5 hours at least, I didn't let it finish. Now, the run time seems to be around 2 hours again, so this means the improvement from the starting time requirements should be around 50x!

Licenser

Throwing in a few thoughts :)

src/traits/one.rs

src/traits/zero.rs

src/test/wilcoxon_w.rs

src/statistics/ranks.rs

LucaCappelletti94 · 2024-09-17T17:58:15Z

@larsgw do checkout this PR when you have time, I think this is as good as it will get performance-wise.

larsgw · 2024-10-01T17:35:10Z

I am still very thankful, but I simply do not have the time to look at this in the next weeks, sorry.

Slightly improved speed of wilcoxon signed rank

4b7ede1

LucaCappelletti94 added 2 commits August 18, 2024 12:41

Removed a collect

e9d2ab5

Fused ranks operations and further sped up Wilcoxon

cf4edd1

LucaCappelletti94 changed the title ~~Slightly improved speed of wilcoxon signed rank~~ Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! Aug 18, 2024

Increased size of benchmark array

9c899b7

LucaCappelletti94 added 2 commits August 18, 2024 17:05

Increased size of benchmark and re-executed original benchmark

dc35f4f

Formatted code

10de000

Added generics to wilcoxon and added benchmark for f32, i32 and i64

97f8829

LucaCappelletti94 added 2 commits August 23, 2024 09:44

Added test suite including voracious

a0cf4ea

Implemented, tested and benched quantized wilcoxon

935aed3

Simplified branches and made implicit a division

c10f141

Removed clippy expect and formatted code, as it wasn't supported by CI

8c9c533

Licenser reviewed Aug 23, 2024

View reviewed changes

src/traits/one.rs Show resolved Hide resolved

src/traits/zero.rs Show resolved Hide resolved

src/test/wilcoxon_w.rs Show resolved Hide resolved

src/statistics/ranks.rs Show resolved Hide resolved

src/statistics/ranks.rs Show resolved Hide resolved

Minimal improvements with use of inline

525d437

LucaCappelletti94 added 2 commits September 17, 2024 22:41

Merge branch 'main' into faster_wilcoxon

c576f31

Fixed merge error

68534bc

This was referenced Sep 23, 2024

Support for weighted Wilcoxon test #10

Open

Weighted wilcoxon #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! #9

Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! #9

LucaCappelletti94 commented Aug 18, 2024

LucaCappelletti94 commented Aug 18, 2024

LucaCappelletti94 commented Aug 18, 2024

LucaCappelletti94 commented Aug 18, 2024

larsgw commented Aug 20, 2024

LucaCappelletti94 commented Aug 20, 2024

LucaCappelletti94 commented Aug 23, 2024

LucaCappelletti94 commented Aug 23, 2024

LucaCappelletti94 commented Aug 23, 2024

Licenser left a comment

LucaCappelletti94 commented Sep 17, 2024

larsgw commented Oct 1, 2024 •

edited

Loading

Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! #9

Are you sure you want to change the base?

Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! #9

Conversation

LucaCappelletti94 commented Aug 18, 2024

LucaCappelletti94 commented Aug 18, 2024

LucaCappelletti94 commented Aug 18, 2024

LucaCappelletti94 commented Aug 18, 2024

larsgw commented Aug 20, 2024

LucaCappelletti94 commented Aug 20, 2024

LucaCappelletti94 commented Aug 23, 2024

LucaCappelletti94 commented Aug 23, 2024

LucaCappelletti94 commented Aug 23, 2024

Licenser left a comment

Choose a reason for hiding this comment

LucaCappelletti94 commented Sep 17, 2024

larsgw commented Oct 1, 2024 • edited Loading

larsgw commented Oct 1, 2024 •

edited

Loading