StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

The majority of sampling methods are unsafe and incorrect in the presence of offset axes

Open yurivish opened this issue 5 years ago • 1 comments

These methods are documented here and all contain unsafe uses of @inbounds. Not all sampling methods suffer from this issue, but a number of them do. When I filed my previous issue about knuths_sample! I had not realized how widespread this problem was. I believe the list below is the full list of sampling methods in sampling.jl with this problem (together with Fisher-Yates and Knuth's).

julia> StatsBase.direct_sample!(OffsetArray(collect(1:11), -5:5), zeros(11))
11-element Array{Float64,1}:
 10.0
  8.0
  0.0
  9.0
 11.0
  4.412732912e9
  7.0
  7.0
  7.0
  4.424086144e9
  4.412732912e9
julia> StatsBase.naive_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
     11.0
      4.557992976e9
      7.0
      0.0
 559108.0
      5.0
      8.0
     10.0
      9.0
     11.0
      4.424077344e9
julia> StatsBase.efraimidis_a_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
      0.0
     10.0
      9.0
     11.0
      4.557992976e9
      7.0
 559108.0
      0.0
      4.975392768e9
     11.0
      8.0

julia> StatsBase.efraimidis_ares_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
      7.0
      0.0
     11.0
     28.0
     10.0
      4.399872352e9
 559108.0
     11.0
      4.557992976e9
      9.0
      8.0

julia> StatsBase.efraimidis_aexpj_wsample_norep!(OffsetArray(collect(1:11), -5:5), Weights(ones(11)), zeros(11))
11-element Array{Float64,1}:
      4.3920316e9
 559108.0
      0.0
     11.0
     10.0
      7.0
      9.0
     11.0
      4.417712592e9
      8.0
      4.557992976e9

yurivish avatar Jan 25 '21 09:01 yurivish

Yes, lots of JuliaStats packages have been written before offset axes existed. Feel free to make a PR adding checks.

nalimilan avatar Jan 25 '21 11:01 nalimilan