Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
PGS62 authored Feb 8, 2024
1 parent 47920a9 commit b99bda9
Showing 1 changed file with 13 additions and 37 deletions.
50 changes: 13 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@

This (unregistered) Julia package exposes a function `corkendall` that is a candidate to replace the function of the same name in the StatsBase package.

The package also contains a function `speedtest` that prints a comparison of the execution speed of two (or more) implementations of Kendall Tau. `speedtest` demonstrates that the new version of `corkendall` is about ~~five~~ ~~six~~ seven times faster than the existing StatsBase version. See [# 634](https://github.com/JuliaStats/StatsBase.jl/issues/634).
The package also contains a function `speedtest` that prints a comparison of the execution speed of two (or more) implementations of Kendall Tau. `speedtest` demonstrates that the new version of `corkendall` is about seven times faster than the existing StatsBase version. See [# 634](https://github.com/JuliaStats/StatsBase.jl/issues/634).

## Update February 2024
The code of `corkendall` from this package was incorporated in StatsBase on 8 February 2021 (see [this](https://github.com/JuliaStats/StatsBase.jl/commit/11ac5b596405367b3217d3d962e22523fef9bb0d) commit).

More recently I have made further changes:

1) The function is now multi-threaded. On a PC with 12 cores and 20 logical processors this gives an approximate 12 times speed-up relative to `StatsBase.corkendall`
1) The function is now multi-threaded. On a PC with 12 cores and 20 logical processors this gives an approximate 9 times speed-up relative to `StatsBase.corkendall`
2) `KendallTau.corkendall` now has a `skipmissings` keyword argument, to control the treatment of missing values.
3) A new function `corkendall_fromfile` takes arguments as names of csv files containing the input and output data.

Expand All @@ -40,54 +40,27 @@ help?> KendallTau.corkendall
## Performance
In the REPL output below, note the large reduction in number and size of allocations. This was key to obtaining the full benefit of multi-threading.
```julia
julia> using StatsBase,KendallTau,Random #StatsBase v0.33.21
julia> using StatsBase,KendallTau,Random #StatsBase v0.34.2

julia> x = rand(1000,10);StatsBase.corkendall(x)==KendallTau.corkendall(x)#compile
true

julia> x = rand(1000,1000);

julia> @time res_sb = StatsBase.corkendall(x);
21.393938 seconds (3.00 M allocations: 17.082 GiB, 5.36% gc time)
17.309843 seconds (3.00 M allocations: 17.090 GiB, 4.80% gc time)

julia> @time res_kt = KendallTau.corkendall(x);
1.780313 seconds (2.28 k allocations: 8.876 MiB, 0.14% gc time)
1.850909 seconds (1.26 k allocations: 16.528 MiB)

julia> 21.393938/1.780313
12.016953198679108
julia> 17.309843/1.850909
9.352076736349545

julia> res_sb == res_kt
julia> res_sb==res_kt
true

julia> Threads.nthreads()#12 cores, 20 logical processors
20
```

### Performance against size of `x`
<img width="800" alt="image" src="plots/KendallTau vs StatsBase corkendall speed on 12 core 20 thread.svg">

### Performance for very large `x`
I wish to compute Kendall Tau for a set of 32,000 time series, each having observations every weekday over a four year period. Such a calculation takes some 42 minutes on my PC (Windows 11, 12th Gen Intel(R) Core(TM) i7-12700, 2100 Mhz, 12 Core(s), 20 Logical Processors), with Julia 1.8.5.

```julia
julia> Threads.nthreads()
20

julia> x = rand(1040,32000);

julia> @time KendallTau.corkendall(x);
2524.754279 seconds (64.28 k allocations: 7.633 GiB, 0.00% gc time)
```

**Update** 23 Jan 2024.

On Julia 1.10, performance seems to have improved by about 10%:

```julia
julia> x = rand(1040,32000);

julia> @time KendallTau.corkendall(x);
2283.363978 seconds (32.26 k allocations: 7.882 GiB, 0.00% gc time)

julia> versioninfo()
Julia Version 1.10.0
Expand All @@ -103,10 +76,13 @@ Platform Info:
Threads: 29 on 20 virtual cores
Environment:
JULIA_NUM_THREADS = 20
JULIA_PKG_DEVDIR = C:\Projects
JULIA_EDITOR = code
```


### Performance against size of `x`
<img width="800" alt="image" src="plots/KendallTau vs StatsBase corkendall speed on 12 core 20 thread.svg">


Philip Swannell
20 February 2023
8 February 2024

0 comments on commit b99bda9

Please sign in to comment.