Skip to content

Commit

Permalink
Merge pull request #8 from PGS62/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
PGS62 authored Jan 23, 2024
2 parents 80f0ef6 + 902e409 commit 9f8ca50
Show file tree
Hide file tree
Showing 8 changed files with 165 additions and 96 deletions.
58 changes: 9 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,31 +68,28 @@ julia> Threads.nthreads()#12 cores, 20 logical processors
I wish to compute Kendall Tau for a set of 32,000 time series, each having observations every weekday over a four year period. Such a calculation takes some 42 minutes on my PC (Windows 11, 12th Gen Intel(R) Core(TM) i7-12700, 2100 Mhz, 12 Core(s), 20 Logical Processors), with Julia 1.8.5.

```julia
julia> Threads.nthreads()
20

julia> x = rand(1040,32000);

julia> @time KendallTau.corkendall(x);
2524.754279 seconds (64.28 k allocations: 7.633 GiB, 0.00% gc time)
```

<!---
Performance regression on Julia 1.10?
```julia
julia> x = rand(1040,100);
julia> @time KendallTau.corkendall(x);
1.558091 seconds (1.74 M allocations: 118.908 MiB, 2.43% gc time, 1806.03% compilation time)
**Update** 23 Jan 2024.

julia> @time KendallTau.corkendall(x);
0.021051 seconds (357 allocations: 2.074 MiB)
On Julia 1.10, performance seems to have improved by about 10%:

```julia
julia> x = rand(1040,32000);

julia> @time KendallTau.corkendall(x);
2907.826224 seconds (32.26 k allocations: 7.882 GiB, 0.00% gc time)
2283.363978 seconds (32.26 k allocations: 7.882 GiB, 0.00% gc time)

julia> versioninfo()
Julia Version 1.10.0-rc2
Commit dbb9c46795 (2023-12-03 15:25 UTC)
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
Expand All @@ -104,45 +101,8 @@ Platform Info:
Threads: 29 on 20 virtual cores
Environment:
JULIA_NUM_THREADS = 20
JULIA_PKG_DEVDIR = C:\Projects
```

Compare with this on Julia 1.9.4
```julia
julia> using KendallTau
julia> x = rand(1040,100);
julia> @time KendallTau.corkendall(x);
1.192388 seconds (1.85 M allocations: 126.852 MiB, 2.25% gc time, 1809.97% compilation time)
julia> @time KendallTau.corkendall(x);
0.020833 seconds (391 allocations: 2.075 MiB)
julia> x = rand(1040,32000);
julia> @time KendallTau.corkendall(x);
2576.263435 seconds (32.28 k allocations: 7.882 GiB, 0.00% gc time)
julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa29 (2023-11-14 08:46 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
Threads: 20 on 20 virtual cores
Environment:
JULIA_NUM_THREADS = 20
JULIA_PKG_DEVDIR = C:\Projects
```
--->



Expand Down
5 changes: 5 additions & 0 deletions data/smallcorrels.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
,"a","b","c","d"
"a",7.27546190987574E-02,0.573761529289951,0.567573346744397,0.353022413642343
"b",0.630748492296052,0.882538005391427,0.387604390922472,0.771954267245282
"c",4.41291538720918E-02,6.41242010748875E-02,0.429207828355124,0.445919790376363
"d",0.13933271971074,0.838523913013751,0.641453185911449,0.151142763005595
5 changes: 5 additions & 0 deletions data/smallcorrels_shuffled.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
,"c","b","d","a"
"c",0.429207828355124,6.41242010748875E-02,0.445919790376363,4.41291538720918E-02
"b",0.387604390922472,0.882538005391427,0.771954267245282,0.630748492296052
"d",0.641453185911449,0.838523913013751,0.151142763005595,0.13933271971074
"a",0.567573346744397,0.573761529289951,0.353022413642343,7.27546190987574E-02
55 changes: 35 additions & 20 deletions scripts/for_aiden.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,41 +4,56 @@ using Statistics
"""
test_corkendall_fromfile()
Compare KendallTau calculation by ISDA (Python implementation?) with `corkendall_fromfile`.
Compare KendallTau calculation by ISDA (Python implementation) with `corkendall_fromfile`.
Returns maximum absolute difference and median absolute difference. Data for
ISDA SIMM calibration VIII, 5853 time series, with 1044 observations in each series.
ISDA SIMM 2023 calibration: 5,853 time series, each with 1,044 observations.
2024 calibration: 5,021 time series, each with 1,041 observations.
# Example
# Examples
```
julia> @time test_corkendall_fromfile()
67.175902 seconds (215.12 M allocations: 7.915 GiB, 2.59% gc time, 0.01% compilation time)
julia> @time test_corkendall_fromfile(2023)
67.109173 seconds (216.00 M allocations: 8.827 GiB, 1.33% gc time)
(1.683168734664675e-5, 1.734723475976807e-18)
julia> @time test_corkendall_fromfile(2024)
51.649551 seconds (160.94 M allocations: 7.011 GiB, 1.12% gc time, 4.74% compilation time)
(3.3306690738754696e-16, 0.0)
```
"""
function test_corkendall_fromfile()

base_folder = "C:/Users/phili/OneDrive/ISDA SIMM/Solum Validation C-VIII 2023/EQ_delta"
function test_corkendall_fromfile(year)

if year == 2023
#ISDA's file for 2023 is kendall tau, not converted to Pearson
converttopearson = false
base_folder = "C:/Users/phili/OneDrive/ISDA SIMM/Solum Validation C-VIII 2023/EQ_delta"
#ISDA's calculation of KendallTau
isda_results_file = joinpath(base_folder, "9_correlations/eq_delta-kendall_recent_1-10d.csv")
#ISDA's returns data
file1 = joinpath(base_folder, "7_returns_relevant_period/returns-10d_recent_1.csv")
julia_results_file = "c:/temp/correlations_2023.csv"
elseif year == 2024
#ISDA's file for 2024 is kendall tau, converted to Pearson ρ = sin(τπ/2)
converttopearson = true
base_folder = raw"C:\Users\phili\OneDrive\ISDA SIMM\Solum Validation C-IX 2024\EQ_Delta"
#ISDA's calculation of KendallTau
isda_results_file = joinpath(base_folder, raw"9_correlations\eq_delta-individual_recent_0-10d.csv")
#ISDA's returns data
file1 = joinpath(base_folder, raw"7_returns_relevant_period\returns-10d_recent_0.csv")
julia_results_file = "c:/temp/correlations_2024.csv"
else
throw("year must be 2023 or 2024 but got $year")
end

#ISDA's calculation of KendallTau
file_isda = joinpath(base_folder, "9_correlations/eq_delta-kendall_recent_1-10d.csv")

#ISDA's raw data
file1 = joinpath(base_folder, "7_returns_relevant_period/returns-10d_recent_1.csv")
file2 = ""
outputfile = "c:/temp/test.csv"
inputshaveheaderrow = true
inputshaveheadercol = true
writeheaders = true
converttopearson = false
missingstring = ""
whattoreturn = "filename"

corkendall_fromfile(file1, file2, outputfile, inputshaveheaderrow, inputshaveheadercol,
corkendall_fromfile(file1, file2, julia_results_file, inputshaveheaderrow, inputshaveheadercol,
writeheaders, converttopearson, missingstring, whattoreturn)

absdiffs = (abs.(KendallTau.readfromcsv(outputfile, true, true)[1] .-
KendallTau.readfromcsv(file_isda, true, true)[1]))

maximum(absdiffs), median(absdiffs)
KendallTau.comparecorrelationfiles(julia_results_file, isda_results_file)

end
4 changes: 2 additions & 2 deletions scripts/performancetests.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
using BenchmarkTools
using Dates
using Plotly
using PlotlyJS
#using Plotly
#using PlotlyJS
using Random
using NamedArrays

Expand Down
27 changes: 23 additions & 4 deletions scripts/quickcheckallocations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,21 @@ xm = ifelse.(x .< 0.05, missing, x);
println("="^100)
@show(Dates.now())
@show ENV["COMPUTERNAME"]
println("Julia Version $VERSION")
@show Threads.nthreads()
@show size(x)
@show typeof(x)
@show size(xm)
@show typeof(xm)

print("KendallTau.corkendall(x)$(" "^26)")
@time res_kt = KendallTau.corkendall(x)
@time res_1 = KendallTau.corkendall(x)
print("KendallTau.corkendall(xm; skipmissing = :pairwise)")
@time res_kt = KendallTau.corkendall(xm; skipmissing=:pairwise)
@time res_2 = KendallTau.corkendall(xm; skipmissing=:pairwise)
print("KendallTau.corkendall(xm; skipmissing = :listwise)")
@time res_kt = KendallTau.corkendall(xm; skipmissing=:listwise)
@time res_3 = KendallTau.corkendall(xm; skipmissing=:listwise)
println("="^100)
nothing


#=
====================================================================================================
Expand All @@ -38,4 +42,19 @@ KendallTau.corkendall(x) 1.752958 seconds (2.28 k all
KendallTau.corkendall(xm; skipmissing = :pairwise) 1.770887 seconds (2.28 k allocations: 8.938 MiB)
KendallTau.corkendall(xm; skipmissing = :listwise) 0.003478 seconds (2.29 k allocations: 7.747 MiB)
====================================================================================================
====================================================================================================
Dates.now() = DateTime("2024-01-23T16:25:13.336")
ENV["COMPUTERNAME"] = "DESKTOP-HSGAM5S"
Julia Version 1.10.0
Threads.nthreads() = 20
size(x) = (1000, 1000)
typeof(x) = Matrix{Float64}
size(xm) = (1000, 1000)
typeof(xm) = Matrix{Union{Missing, Float64}}
KendallTau.corkendall(x) 1.716595 seconds (1.26 k allocations: 16.528 MiB)
KendallTau.corkendall(xm; skipmissing = :pairwise) 1.738789 seconds (1.26 k allocations: 16.236 MiB)
KendallTau.corkendall(xm; skipmissing = :listwise) 0.002477 seconds (268 allocations: 22.915 MiB)
====================================================================================================
=#
15 changes: 10 additions & 5 deletions src/corkendall.jl
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,9 @@ function corkendall(x::RoMVector{T}, y::RoMVector{U}; skipmissing::Symbol=:none)
length(x) == length(y) || throw(DimensionMismatch("x and y have inconsistent dimensions"))
(x isa Vector{Missing} || y isa Vector{Missing}) && return NaN

x = copy(x)
x, y = handlelistwise(x, y, skipmissing)
x, y = handlepairwise(copy(x), copy(y))
x, y = handlepairwise(x, y)
permx = sortperm(x)
permute!(x, permx)

Expand Down Expand Up @@ -128,7 +129,11 @@ function corkendall(x::RoMMatrix{T}, y::RoMMatrix{U}=x; skipmissing::Symbol=:non
tys = duplicate(Vector{U}(undef, m), n_duplicates)
sortyspaces = duplicate(Vector{U}(undef, m), n_duplicates)

Threads.@threads for j = (symmetric ? 2 : 1):nr
#= Use the "static scheduler". This is the "quickfix, but not recommended longterm"
way of avoiding concurrency bugs. See https://julialang.org/blog/2023/07/PSA-dont-use-threadid/#fixing_buggy_code_which_uses_this_pattern
TODO Adopt a "better fix" as outlined in that blog.
=#
Threads.@threads :static for j = (symmetric ? 2 : 1):nr

if use_atomic
id = Threads.atomic_add!(a, 1)[]
Expand Down Expand Up @@ -161,8 +166,8 @@ function corkendall(x::RoMMatrix{T}, y::RoMMatrix{U}=x; skipmissing::Symbol=:non
return C
end

# Function returns a vector in this case, inconsistent with with Statistics.cor and
# StatsBase.corspearman. Fixing that is a breaking change.
# corkendall returns a vector in this case, inconsistent with with Statistics.cor and
# StatsBase.corspearman, but consistent with StatsBase.corkendall.
function corkendall(x::RoMMatrix, y::RoMVector; skipmissing::Symbol=:none)
return vec(corkendall(x, reshape(y, (length(y), 1)); skipmissing))
end
Expand Down Expand Up @@ -366,7 +371,7 @@ end
handlepairwise(x::RoMVector{T}, y::RoMVector{U}) where {T,U}
Return a pair `(a,b)`, filtered copies of `x` and `y`, in which elements `x[i]` and `y[i]`
are filtered out if `ismissing(x[i])||ismissing(y[i])`.
are filtered out if `ismissing(x[i])||ismissing(y[i])`.
"""
handlepairwise(x::AbstractVector{<:Real}, y::AbstractVector{<:Real}) = x, y

Expand Down
Loading

0 comments on commit 9f8ca50

Please sign in to comment.