-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massive improved speed of wilcoxon signed rank - from 2 hours to 12 minutes! #9
base: main
Are you sure you want to change the base?
Conversation
FYI, in my practical use case this change reduced the effective wall time from 2:20 hours to 1:27 hours, so more than 10%. |
I have obtained some more improvements, now the benchmark show:
Which is an overall speedup compared to the original 14%. In my use case, which has much larger sample sizes than those I employ in this benchmark, the speedup is massive - from the initial 2:20 hours, it stands now at just 12 minutes! |
The command to run the bench is Having now increased the size of the array, the original version performs as follows:
While the improved version achieves the following:
I believe the larger the sample array, the more significant the time improvement, and in my case there are several millions of samples. |
Thank you so much, again! I have not worked on this in three years, and have not used a lot of Rust since, so I might need a bit more time to get back up to speed. |
Not to worry, your code is understandable! Otherwise, I could not have optimized it so quickly. |
New updates - I have added the possibility of using f32 and signed integers and the optional feature of voracious sorting. I will now be adding methods allowing for quantization. These are the new performances:
|
I have now introduced the quantized variant, which allows a user to specify a signed integer target for quantization of the deltas of the Wilcoxon test. This can be done as the only thing the Wilcoxon cares about is not the absolute value, but the relative absolute value, so any division by a positive scalar to the deltas does not change the results. Of course, switching to integers allows for significantly faster performance in sorting - here are the benchmarks:
|
So, the initial improvement brought the Wilcoxon test part of my benchmark from 2 hours to 12 minutes, which allowed me to increase the number of tests to consider. After having added the new tests, the time requirements ballooned to around 5 hours at least, I didn't let it finish. Now, the run time seems to be around 2 hours again, so this means the improvement from the starting time requirements should be around 50x! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing in a few thoughts :)
@larsgw do checkout this PR when you have time, I think this is as good as it will get performance-wise. |
I am still very thankful, but I simply do not have the time to look at this in the next weeks, sorry. |
I am running a lot of wilcoxon tests, so I need it to be as fast as possible. I got a small speed improvement so far, with the original performance being:
Wilcoxon signed-rank test time: [2.6232 ms 2.6244 ms 2.6259 ms]
and the updated performance now up to:
So about a solid 10% speedup. Not much, but it is something of note.