Skip to content

Commit

Permalink
comments only
Browse files Browse the repository at this point in the history
  • Loading branch information
PGS62 committed Jan 21, 2021
1 parent b12d8b5 commit ba78580
Show file tree
Hide file tree
Showing 9 changed files with 97 additions and 32 deletions.
11 changes: 0 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,25 +176,14 @@ Results from all 3 functions identical? true
</p>
</details>





## Other features:
A function `corkendallnaive` that implements the obvious order N^2 algorithm. This function is not exported, but is used in the function `compare_implementations` in
`tests/rankcorr.jl` which is quite a thorough test harness, and could be copied over to `StatsBase/tests/rankcorr.jl`.

Functions `corkendallthreads_v1`, `corkendallthreads_v2` and `corkendallthreads_v3` which are experimental for the time being.


## To do
In the event that either `x` or `y` contain `nan` values the function currently returns `nan`. The Kendall Tau calculators in both Python and R allow alternative (and often useful) handling of `nan` values, and I would like to implement something similar. See argument `nan_policy` to the Python function `scipy.stats.kendalltau` [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html) and argument `use` to the R function `cor` [here](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/cor). For my own projects, I particularly need an equivalent of R's `use = "pairwise.complete.obs"`




Philip Swannell
18 Jan 2021



7 changes: 1 addition & 6 deletions src/KendallTau.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,4 @@ include("threads_v3.jl")

export corkendall


end # module




end # module
3 changes: 1 addition & 2 deletions src/rankcorr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -161,13 +161,12 @@ function mergesort!(v::AbstractVector, lo::Integer, hi::Integer, small_threshold
return nswaps
end


"""
countties(x::RealVector,lo::Int64,hi::Int64)
Assumes `x` is sorted. Returns the number of ties within `x[lo:hi]`.
"""
function countties(x::RealVector, lo::Int64, hi::Int64)
function countties(x::AbstractVector, lo::Integer, hi::Integer)
thistiecount, result = 0, 0
for i (lo + 1):hi
if x[i] == x[i - 1]
Expand Down
2 changes: 0 additions & 2 deletions src/speedtestresults.txt
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,6 @@ KendallTau.corkendallthreads_v1(vector1,vector2)
all(myapprox.(results[2:end], results[1:end - 1], 1.0e-14)) = true
###################################################################


###################################################################
Executing speedtest 2021-01-17T10:44:51.955
--------------------------------------------------
Expand Down Expand Up @@ -175,7 +174,6 @@ KendallTau.corkendallthreads_v2(matrix1,matrix2)
all(myapprox.(results[2:end], results[1:end - 1], 1.0e-14)) = true
###################################################################


###################################################################
Executing speedtest 2021-01-18T09:28:05.553
size(matrix1) = (2000, 10)
Expand Down
6 changes: 1 addition & 5 deletions src/speedtestresults2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ Recent changes:
mergesort more memory efficient via correct use of buffer and resize! function.
mergesort! & merge! refactored to be a bit more similar to functions in base/sort.jl



julia> speedtest([StatsBase.corkendall,KendallTau.corkendall,KendallTau.corkendallthreads_v2],2000,10)
###################################################################
Executing speedtest 2021-01-19T15:48:47.282
Expand Down Expand Up @@ -90,6 +88,4 @@ KendallTau.corkendallthreads_v2(manyrepeats1,manyrepeats2)
Speed ratio KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 2.461948941847068
Ratio of memory allocated KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 2.275738940686372
Results from all 3 functions identical? true
###################################################################


###################################################################
91 changes: 91 additions & 0 deletions src/speedtestresults3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
PGS 21 Jan 2021
mergesort! and insertionsort! now very similar indeed to sort! in base/sort.jl
Note speed improvements and the fact that code (as reported by @btime) as now more
memory efficient than the StatsBase verison, which is nice.

julia> KendallTau.speedtest([StatsBase.corkendall,KendallTau.corkendall,KendallTau.corkendallthreads_v2],2000,10)
###################################################################
Executing speedtest 2021-01-21T14:56:19.489
size(matrix1) = (2000, 10)
StatsBase.corkendall(matrix1)
33.684 ms (451 allocations: 5.54 MiB)
Main.KendallTau.corkendall(matrix1)
5.394 ms (298 allocations: 3.40 MiB)
Speed ratio Main.KendallTau.corkendall vs StatsBase.corkendall: 6.244948088546108
Ratio of memory allocated Main.KendallTau.corkendall vs StatsBase.corkendall: 0.6130525086357451
Main.KendallTau.corkendallthreads_v2(matrix1)
1.706 ms (614 allocations: 3.44 MiB)
Speed ratio Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 19.738646938177556
Ratio of memory allocated Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 0.6202723771851052
Results from all 3 functions identical? true
--------------------------------------------------
size(matrix1) = (2000, 10)
size(matrix2) = (2000, 10)
StatsBase.corkendall(matrix1,matrix2)
76.453 ms (1001 allocations: 12.31 MiB)
Main.KendallTau.corkendall(matrix1,matrix2)
11.200 ms (631 allocations: 7.24 MiB)
Speed ratio Main.KendallTau.corkendall vs StatsBase.corkendall: 6.826188109481081
Ratio of memory allocated Main.KendallTau.corkendall vs StatsBase.corkendall: 0.5880152134243097
Main.KendallTau.corkendallthreads_v2(matrix1,matrix2)
3.925 ms (712 allocations: 7.25 MiB)
Speed ratio Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 19.481024466550014
Ratio of memory allocated Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 0.588845802919708
Results from all 3 functions identical? true
--------------------------------------------------
size(vector1) = (2000,)
size(matrix1) = (2000, 10)
StatsBase.corkendall(vector1,matrix1)
7.374 ms (103 allocations: 1.23 MiB)
Main.KendallTau.corkendall(vector1,matrix1)
1.096 ms (65 allocations: 725.55 KiB)
Speed ratio Main.KendallTau.corkendall vs StatsBase.corkendall: 6.726540843328325
Ratio of memory allocated Main.KendallTau.corkendall vs StatsBase.corkendall: 0.5755739005404333
Main.KendallTau.corkendallthreads_v2(vector1,matrix1)
464.500 μs (133 allocations: 734.48 KiB)
Speed ratio Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 15.875780409041981
Ratio of memory allocated Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 0.5826639892904953
Results from all 3 functions identical? true
--------------------------------------------------
size(matrix1) = (2000, 10)
size(vector1) = (2000,)
StatsBase.corkendall(matrix1,vector1)
7.379 ms (101 allocations: 1.23 MiB)
Main.KendallTau.corkendall(matrix1,vector1)
1.097 ms (63 allocations: 725.45 KiB)
Speed ratio Main.KendallTau.corkendall vs StatsBase.corkendall: 6.725142622801422
Ratio of memory allocated Main.KendallTau.corkendall vs StatsBase.corkendall: 0.5755423329614479
Main.KendallTau.corkendallthreads_v2(matrix1,vector1)
474.300 μs (134 allocations: 734.52 KiB)
Speed ratio Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 15.558716002530044
Ratio of memory allocated Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 0.5827321185074997
Results from all 3 functions identical? true
--------------------------------------------------
size(vector1) = (2000,)
size(vector2) = (2000,)
StatsBase.corkendall(vector1,vector2)
733.000 μs (10 allocations: 126.03 KiB)
Main.KendallTau.corkendall(vector1,vector2)
180.999 μs (8 allocations: 86.72 KiB)
Speed ratio Main.KendallTau.corkendall vs StatsBase.corkendall: 4.049746131194095
Ratio of memory allocated Main.KendallTau.corkendall vs StatsBase.corkendall: 0.6880733944954128
Main.KendallTau.corkendallthreads_v2(vector1,vector2)
183.900 μs (10 allocations: 118.22 KiB)
Speed ratio Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 3.9858618814573137
Ratio of memory allocated Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 0.9380114059013142
Results from all 3 functions identical? true
--------------------------------------------------
size(manyrepeats1) = (2000,)
size(manyrepeats2) = (2000,)
StatsBase.corkendall(manyrepeats1,manyrepeats2)
442.500 μs (12 allocations: 157.53 KiB)
Main.KendallTau.corkendall(manyrepeats1,manyrepeats2)
148.201 μs (14 allocations: 126.38 KiB)
Speed ratio Main.KendallTau.corkendall vs StatsBase.corkendall: 2.9858098123494443
Ratio of memory allocated Main.KendallTau.corkendall vs StatsBase.corkendall: 0.8022217813925808
Main.KendallTau.corkendallthreads_v2(manyrepeats1,manyrepeats2)
150.700 μs (16 allocations: 157.88 KiB)
Speed ratio Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 2.936297279362973
Ratio of memory allocated Main.KendallTau.corkendallthreads_v2 vs StatsBase.corkendall: 1.0021821067248562
Results from all 3 functions identical? true
###################################################################
7 changes: 3 additions & 4 deletions src/speedtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ using Dates
"""
@btimed expression [other parameters...]
An amended version of BenchmarkTools.@btime. Identical except the return is a tuple of the result of the `expression` evaluation, the trialmin (of type BenchmarkTools.TrialEstimate) and the memory allocated (a number of bytes).
An amended version of BenchmarkTools.@btime. Identical except the return is a tuple of
the result of the `expression` evaluation, the trialmin (of type BenchmarkTools.TrialEstimate)
and the memory allocated (a number of bytes).
"""
macro btimed(args...)
_, params = BenchmarkTools.prunekwargs(args...)
Expand All @@ -29,7 +30,6 @@ macro btimed(args...)
end)
end


"""
speedtest(functions, nr::Int, nc::Int)
Expand Down Expand Up @@ -204,7 +204,6 @@ function myapprox(x::Float64, y::Float64, abstol::Float64)
end
end


"""
speedtest_repeatdensity(functions,nr)
Expand Down
1 change: 0 additions & 1 deletion src/threads_v3.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ function corkendallthreads_v3(X::RealMatrix, Y::RealMatrix)
return C
end


#thinking here is that corkendall is more efficient if y argument has more columns than X (but that's only a hunch, haven't actually tested it.)
function corkendallthreads_v4(X::RealMatrix, Y::RealMatrix)
nr = size(X, 2)
Expand Down
1 change: 0 additions & 1 deletion test/rankcorr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@ function corkendallnaive(X::RealMatrix)
return C
end


"""
compare_implementations(fn1, fn2, abstol::Float64=1e-14, maxcols=10, maxrows=500, numtests=100)
Expand Down

0 comments on commit ba78580

Please sign in to comment.