StatsBase.jl
StatsBase.jl copied to clipboard
The majority of sampling methods are unsafe and incorrect in the presence of offset axes
These methods are documented here and all contain unsafe uses of @inbounds. Not all sampling methods suffer from this issue, but a number of them do. When I filed my previous issue about knuths_sample! I had not realized how widespread this problem was. I believe the list below is the full list of sampling methods in sampling.jl with this problem (together with Fisher-Yates and Knuth's).
julia> StatsBase.direct_sample!(OffsetArray(collect(1:11), -5:5), zeros(11))
11-element Array{Float64,1}:
10.0
8.0
0.0
9.0
11.0
4.412732912e9
7.0
7.0
7.0
4.424086144e9
4.412732912e9
julia> StatsBase.naive_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
11.0
4.557992976e9
7.0
0.0
559108.0
5.0
8.0
10.0
9.0
11.0
4.424077344e9
julia> StatsBase.efraimidis_a_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
0.0
10.0
9.0
11.0
4.557992976e9
7.0
559108.0
0.0
4.975392768e9
11.0
8.0
julia> StatsBase.efraimidis_ares_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
7.0
0.0
11.0
28.0
10.0
4.399872352e9
559108.0
11.0
4.557992976e9
9.0
8.0
julia> StatsBase.efraimidis_aexpj_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
4.3920316e9
559108.0
0.0
11.0
10.0
7.0
9.0
11.0
4.417712592e9
8.0
4.557992976e9
Yes, lots of JuliaStats packages have been written before offset axes existed. Feel free to make a PR adding checks.