Improve quantile performance v2
This is an alternative implementation to #86.
Here following https://github.com/JuliaStats/Statistics.jl/pull/86#discussion_r704274221 I perform partial sorting incrementally.
I make this a separate PR to allow an easy comparison of both. Either one or the other should be merged.
Codecov Report
Merging #91 (fb4c8d8) into master (74897fe) will increase coverage by
0.02%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #91 +/- ##
==========================================
+ Coverage 96.89% 96.91% +0.02%
==========================================
Files 1 1
Lines 419 422 +3
==========================================
+ Hits 406 409 +3
Misses 13 13
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/Statistics.jl | 96.91% <100.00%> (+0.02%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 74897fe...fb4c8d8. Read the comment docs.
The only drawback of this approach is the case when very many quantiles are requested as we sort p and then perform many partial-sorts. Maybe we should use some threshold on p and use the old algorithm if it is long?
Thanks. Is there any reason to think that a series of partial sorts of nested subsets of the data would be significantly slower than a single full sort? Have you tried benchmarking this?
Is there any reason to think that a series of partial sorts of nested subsets of the data would be significantly slower than a single full sort?
We have to sort p, so it its length is comparable to length of the vector in which we are analyzing we have to essentially perform the sorting twice. I will try to benchmark this and post the results.
Here are the benchmarks:
julia> function f1(n)
x = rand(10^6); x2 = copy(x)
p = rand(n)
@time Statistics._quantilesort!(x, false, extrema(p)...)
@time new_quantilesort!(x, false, p)
nothing
end
f1 (generic function with 1 method)
julia> f1(5)
0.104513 seconds
0.018321 seconds (1 allocation: 128 bytes)
julia> f1(5)
0.075638 seconds
0.015375 seconds (1 allocation: 128 bytes)
julia> f1(5)
0.100432 seconds
0.018790 seconds (1 allocation: 128 bytes)
julia> f1(50)
0.099497 seconds
0.130415 seconds (1 allocation: 496 bytes)
julia> f1(50)
0.101801 seconds
0.106628 seconds (1 allocation: 496 bytes)
julia> f1(50)
0.109406 seconds
0.105568 seconds (1 allocation: 496 bytes)
julia> f1(500)
0.112124 seconds
1.050222 seconds (1 allocation: 4.062 KiB)
julia> f1(500)
0.106582 seconds
1.027544 seconds (1 allocation: 4.062 KiB)
so as you can see it starts to deteriorate much faster.
(as usual - it would not hurt if you double checked this if you had time as I might have made some error here)
I'm not sure it's worth worrying about performance when the number of quantiles is large compared to the data. Quantiles don't make a lot of sense in that case.
Maybe a simple optimization is to do sort_p = issorted(p) ? p : sort(p) to avoid making a copy if possible (as that will be the case almost all the time).