Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Dispatch for drawing multiples

Open mmikhasenko opened this issue 8 months ago • 7 comments

Closes #1984

Implementation

  • [x] Dispatch for MixtureModel
  • [x] Dispatch for Truncated

Checkpoints

  • [x] Tests are passing
  • [x] Docs are updated -- discussed that docstrings are placed to dispatched functions for consistency (#1986)

Sanity checks

Speed

For mixture model,

julia> using Distributions
julia> using BenchmarkTools

julia> d = MixtureModel([Normal(1,0.1), Normal(2,0.1), Normal(3,0.1)], [0.3,0.3,0.4])
julia> @btime rand($d, 10_000);
  73.708 μs (7 allocations: 125.67 KiB)

julia> @btime [rand($d) for _ in 1:10_000];
  124.125 μs (3 allocations: 96.06 KiB)

For truncated

t = truncated(Normal(), 0, 3)
@btime rand($t, 10_000);  # 114.625 μs (6 allocations: 256.12 KiB)
@btime [rand($t) for _ in 1:10_000];  # 169.875 μs (3 allocations: 96.06 KiB)

Visual

d = MixtureModel([Normal(1,0.1), Normal(2,0.1), Normal(3,0.1)], [0.3,0.3,0.4])
s1 = [rand(d) for _ in 1:10000]
s2 = rand(d, 10000)
let
    plot(); 
    stephist!(s1, bins=range(0, 4, 100), lab="rand(d) n times")
    stephist!(s2, bins=range(0, 4, 100), lab="rand(d,n)")
end

image

t = truncated(Normal(), 0, 3)
r1 = [rand(t) for _ in 1:10000]
r2 = rand(t, 10000)
let
    plot(); 
    stephist!(t1, bins=range(0, 4, 100), lab="rand(d) n times")
    stephist!(t2, bins=range(0, 4, 100), lab="rand(d,n)")
end

image

mmikhasenko avatar Jun 17 '25 20:06 mmikhasenko

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 86.26%. Comparing base (abb151c) to head (97e3b25).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1985      +/-   ##
==========================================
- Coverage   86.28%   86.26%   -0.03%     
==========================================
  Files         146      146              
  Lines        8787     8847      +60     
==========================================
+ Hits         7582     7632      +50     
- Misses       1205     1215      +10     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Jun 17 '25 21:06 codecov-commenter

@devmotion thanks a lot for the review.

  • [x] Questions should be addressed.
  • [x] truncated of small mass: I've fixed broken pipeline for large truncated of small remaining volume - now, it falls back to rand(n) if needs to generate more than 10M, this cut value is somewhat arbitrary, so suggestions are welcome.
  • [x] docstrings are removed.
  • [x] made sanity checks of comparing samples of rand(d) and rand(d,n)

No docs are added

mmikhasenko avatar Jun 18 '25 17:06 mmikhasenko

The last item that I think about is the 10M cap for switching between algorithm. It's bad to switch between algorithms in general for predictability, so the design of Truncated with a switch is not ideal. But we will not address it in this MR.

I'm thinking to proceed with a similar if/else of 0.25% of distribution. Similar (not-ideal) patterns will be easy to identify in feature and update.

mmikhasenko avatar Jun 19 '25 20:06 mmikhasenko

@devmotion thanks for the last comments.

  • the truncated was easy to fix,
  • I've modified the code rand + resize strategy for the mixture model

mmikhasenko avatar Jun 23 '25 19:06 mmikhasenko

Re-evaluated benchmarks in the header, the rand-n version is faster for both models

mmikhasenko avatar Jun 23 '25 19:06 mmikhasenko

Comments are resolved, this PR is ready for review

mmikhasenko avatar Jun 25 '25 09:06 mmikhasenko

As a general comment before any detailed review:

the rand-n version is faster for both models

Can we test different number of samples? n = 10000 is a bit extreme. Can we check n = 1, n = 5, n = 10, n = 50, n = 100, n = 500, n = 1000 etc. as well?

devmotion avatar Jun 25 '25 09:06 devmotion