SentinelArrays.jl icon indicating copy to clipboard operation
SentinelArrays.jl copied to clipboard

Performance of SentinelArrays

Open bkamins opened this issue 3 years ago • 0 comments

In some practical cases SentinelVector is much slower than Vector. For example for data tested in https://bkamins.github.io/julialang/2022/12/23/duckdb.html.

We have:

julia> summary(posts)
"42710197×3 DataFrame"

julia> typeof.(eachcol(posts))
3-element Vector{DataType}:
 SentinelArrays.ChainedVector{Union{Missing, Int64}, SentinelArrays.SentinelVector{Int64, Int64, Missing, Vector{Int64}}}
 SentinelArrays.ChainedVector{Union{Missing, Int64}, SentinelArrays.SentinelVector{Int64, Int64, Missing, Vector{Int64}}}
 SentinelArrays.ChainedVector{Union{Missing, Int64}, SentinelArrays.SentinelVector{Int64, Int64, Missing, Vector{Int64}}}

julia> @time dropmissing(posts);
  0.819397 seconds (137 allocations: 1.822 GiB)

julia> @time dropmissing(copy(posts));
  0.560146 seconds (130 allocations: 2.657 GiB)

and - as you can see - it is faster to copy a data frame (to change sentinel vectors to just Vector) and then do dropmissing than just do dropmissing directly.

bkamins avatar Jan 26 '23 16:01 bkamins