Skip to content

Best practices for high online statistics for many parameters #261

Closed
@vandenman

Description

@vandenman

Related to #158

I'm doing some MCMC where there are too many parameters to save all samples. So instead I figured, let's use OnlineStats to store only the few statistics that I'm interested in (e.g., Mean, Variance, AutoCorrelation). However, it is unclear to me how to do this properly. Right now, I have a function like so:

function initialize_online_statistics(online_statistics, p::Integer, k::Integer)

    ne_plus_diag = p * (p + 1) ÷ 2
    ne = ne_plus_diag - p

    # not sure about this structure
    return (
        a = [OnlineStats.Group([copy(online_statistics) for _ in 1:ne_plus_diag]) for _ in 1:k],
        b = OnlineStats.Group([copy(online_statistics)  for _ in 1:ne]),
        c = copy(online_statistics)
    )
end

The idea is that a user (just me for now) supplies a Series which gets passed to online_statistics. Internally, I run some MCMC algorith and after each iteration I update the statistics for a, b, and c. This way, a user can specify which statistics to track themselves and they're not baked into the code.

Example usage:

initialize_online_statistics(OnlineStats.Series(Mean(), Variance(), AutoCov(5)), 5, 10)

works fine. However, for a larger size, this same fails with a StackOverflowError:

initialize_online_statistics(OnlineStats.Series(Mean(), Variance(), AutoCov(5)), 116, 100)
ERROR: StackOverflowError:
 [1] promote_type(::Type, ::Type, ::Type, ::Vararg{Type}) (repeats 6161 times)
   @ Base ./promotion.jl:293
 [2] Group(stats::Vector{OnlineStatsBase.Series{Number, Tuple{Mean{Float64, EqualWeight}, Variance{Float64, Float64, EqualWeight}, AutoCov{Float64, Float64}}}})
   @ OnlineStatsBase ~/.julia/packages/OnlineStatsBase/4TwKN/src/stats.jl:368
 [3] #26
   @ ./none:0 [inlined]
 [4] iterate
   @ ./generator.jl:47 [inlined]
 [5] collect(itr::Base.Generator{UnitRange{Int64}, var"#26#29"{OnlineStatsBase.Series{Number, Tuple{Mean{Float64, EqualWeight}, Variance{Float64, Float64, EqualWeight}, AutoCov{Float64, Float64}}}, Int64}})
   @ Base ./array.jl:782
 [6] initialize_online_statistics(online_statistics::OnlineStatsBase.Series{Number, Tuple{Mean{Float64, EqualWeight}, Variance{Float64, Float64, EqualWeight}, AutoCov{Float64, Float64}}}, p::Int64, k::Int64)
 [7] top-level scope

I think the problem is that Group does not specialize when the contents
are all the same. For example Group(Mean(), Mean()) has type Group{Tuple{Mean{Float64, EqualWeight}, Mean{Float64, EqualWeight}}, Union{Tuple{Number, Number}, NamedTuple{names, R} where R<:Tuple{Number, Number}, AbstractVector{<:Number}} where names} and Group(Mean(), Mean(), Mean()) has an additional , Mean{Float64, EqualWeight}. Eventually, a StackOverflow is reached.

For now, I can use a Vector{T} Where {T<:OnlineStat}, but I feel like I'm reinventing the idea behind Group. Perhaps there should be a specialized type for this case, like MonoGroup{T, U<:Int} where {T<:Union{Series, OnlineStat}}?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions