SimDesign `mirai` over `future`?

Hi Phil,

I was wondering if it might be a good idea to try to follow what purrr is doing and introduce a direct mirai parallelism backend to SimDesign over using future?

https://tidyverse.org/blog/2025/07/purrr-1-1-0-parallel/#production-ready-with-mirai

Nov 04 '25 14:11 emstruong

Wouldn't this already be supported via future.miari? https://future.mirai.futureverse.org/

Of course, a purrr::map() could be put in the main function distributor as it's basically just a lapply() logic, but I wonder if this is worth the effort. What's the performance benefit of doing this over and above what's currently supported using parallel or future?

Nov 04 '25 17:11 philchalmers

I think I communicated my idea poorly, based on that original link, my understanding is that purrr has found a way to bypass future and to use mirai directly.

And what they're claiming is that using mirai directly is better than having any dependency on future whatsoever. As they wrote,

Compared to the furrr package:

Much lower overhead means you can get a performance boost even for relatively fast functions

More linear scaling means you get the same benefits whether you’re running on 2 or 200 cores

So what I'm wondering is if SimDesign might benefit from copying purrr use of mirai, but not necessarily use purrr itself.

Nov 05 '25 00:11 emstruong

You can use mirai in a number of ways already, not just in it's canonical form. For instance, if you define a cluster object using miria::make_cluster() and pass that to the cl argument then you get the benefits of miria without changing anything else in the package as this is supported in functions like parLapply().

From what I've gathered miria seems to work really well when requesting parallel processing when the jobs are short, so the efficiency in the distribution of the jobs can see large benefits as the microseconds add up (particularly important for interactive applets like shiny). For larger jobs, like in a simulation replicate where each execution time isn't exactly trivial, it's unlikely you'll see particularly large improvements. Nevertheless, it's tempting to switch the automatically generated cluster currently used from parallel to the one available from mirai, as then parallel = TRUE could be the default as you won't see as much of a performance hit when the parallel processing is generally unnecessary.

I'll have to think about whether that's worth the complexity as I generally like leaving parallel processing off until it's run time. And of course, further unit testing would be required as there's chances this could break something, or show so little improvements overall that the effect is trivial.

Nov 05 '25 15:11 philchalmers

Here's a quick overview of the mirai approach vs parallel with simple simulations. As I suspected the difference in simulations that take a non-trivial amount of time (the majority of them) there's barely any difference between the approaches. Nevertheless, some very small differences are detectable, and across full simulation experiments would add up to a few seconds or minutes saved, so in that sense I wouldn't mind adding a flag to the package to switch between the mirai vs parallel cluster definitions. See below.

library(parallel)
library(mirai)
base <- parallel::makeCluster(4)
mirai <- mirai::make_cluster(4)

#####################

# test with simple sampling from rpois() with n=1 and n=100
system.time(rpois(n=1, lambda=1))
#>    user  system elapsed 
#>       0       0       0

# N = 1 per sample
x <- rep(1, 1000)
res <- microbenchmark::microbenchmark(
    parLapply(base, x, rpois, lambda=1),
    lapply(x, rpois, lambda = 1),
    parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> ℹ Please use tidy evaluation idioms with `aes()`.
#> ℹ See also `vignette("ggplot2-in-packages")` for more information.
#> ℹ The deprecated feature was likely used in the microbenchmark package.
#>   Please report the issue at
#>   <https://github.com/joshuaulrich/microbenchmark/issues/>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.


x <- rep(100, 1000)
system.time(rpois(n=100, lambda=1))
#>    user  system elapsed 
#>       0       0       0
res <- microbenchmark::microbenchmark(
    parLapply(base, x, rpois, lambda=1),
    lapply(x, rpois, lambda=1),
    parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()


#############

# test with a simple simulation experiment:
#   - one factor FA, sample data from MVN, fit model
#   - Use N=100 and N=1000
myfun <- function(n){
    F <- matrix(rep(.6, 10))
    Psi <- diag(1 - rowSums(F^2))
    R <- F %*% t(F) + Psi
    dat <- SimDesign::rmvnorm(n, sigma=R)
    fa <- psych::fa(dat, 1)
    fa
}
invisible(myfun(100))

system.time(myfun(100))
#>    user  system elapsed 
#>   0.016   0.000   0.016
system.time(myfun(1000))
#>    user  system elapsed 
#>   0.012   0.000   0.012

## now with a simulation experiment
x <- rep(100, 1000)
res <- microbenchmark::microbenchmark(
    parLapply(base, x, myfun),
    lapply(x, myfun),
    parLapply(mirai, x, myfun)
)
res
#> Unit: seconds
#>                        expr      min       lq     mean   median       uq
#>   parLapply(base, x, myfun) 2.443161 2.541470 2.589598 2.570586 2.607769
#>            lapply(x, myfun) 9.473621 9.603256 9.683442 9.681214 9.756624
#>  parLapply(mirai, x, myfun) 2.385631 2.455194 2.522000 2.498668 2.557127
#>        max neval cld
#>   3.616511   100 a  
#>  10.072631   100  b 
#>   3.422199   100   c
ggplot2::autoplot(res) + ggplot2::theme_minimal()


# is this actually faster?
sub <- subset(res, grepl('parLapply', res$expr))
t.test(time ~ expr, sub)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  time by expr
#> t = 3.8138, df = 197.83, p-value = 0.0001828
#> alternative hypothesis: true difference in means between group parLapply(base, x, myfun) and group parLapply(mirai, x, myfun) is not equal to 0
#> 95 percent confidence interval:
#>   32644271 102550447
#> sample estimates:
#>  mean in group parLapply(base, x, myfun) 
#>                               2589597586 
#> mean in group parLapply(mirai, x, myfun) 
#>                               2522000227

# with N = 1000
x <- rep(1000, 1000)
res2 <- microbenchmark::microbenchmark(
    parLapply(base, x, myfun),
    lapply(x, myfun),
    parLapply(mirai, x, myfun)
)
res2
#> Unit: seconds
#>                        expr       min        lq      mean    median        uq
#>   parLapply(base, x, myfun)  2.987846  3.077223  3.133002  3.115353  3.162935
#>            lapply(x, myfun) 11.377627 11.601821 11.767714 11.737279 11.928104
#>  parLapply(mirai, x, myfun)  2.946469  2.992987  3.037530  3.015425  3.076472
#>        max neval cld
#>   3.459101   100 a  
#>  12.594099   100  b 
#>   3.370846   100   c
ggplot2::autoplot(res2) + ggplot2::theme_minimal()


# is this actually faster?
sub <- subset(res2, grepl('parLapply', res2$expr))
t.test(time ~ expr, sub)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  time by expr
#> t = 8.1662, df = 191.89, p-value = 4.164e-14
#> alternative hypothesis: true difference in means between group parLapply(base, x, myfun) and group parLapply(mirai, x, myfun) is not equal to 0
#> 95 percent confidence interval:
#>   72411939 118530970
#> sample estimates:
#>  mean in group parLapply(base, x, myfun) 
#>                               3133001835 
#> mean in group parLapply(mirai, x, myfun) 
#>                               3037530381


stopCluster(base)
stop_cluster(mirai)

#############

# redefine using everything available
nc <- parallel::detectCores() - 1
base <- parallel::makeCluster(nc)
mirai <- mirai::make_cluster(nc)

# N = 1 per sample
x <- rep(1, 1000)
res <- microbenchmark::microbenchmark(
    parLapply(base, x, rpois, lambda=1),
    lapply(x, rpois, lambda = 1),
    parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()


x <- rep(100, 1000)
system.time(rpois(n=100, lambda=1))
#>    user  system elapsed 
#>       0       0       0
res <- microbenchmark::microbenchmark(
    parLapply(base, x, rpois, lambda=1),
    lapply(x, rpois, lambda=1),
    parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()


## now with a simulation experiment
x <- rep(100, 1000)
res <- microbenchmark::microbenchmark(
    parLapply(base, x, myfun),
    lapply(x, myfun),
    parLapply(mirai, x, myfun)
)
res
#> Unit: milliseconds
#>                        expr       min        lq      mean    median        uq
#>   parLapply(base, x, myfun)  491.9251  522.1341  566.6943  539.1654  553.2034
#>            lapply(x, myfun) 9260.3089 9472.1508 9637.1479 9631.9655 9752.0433
#>  parLapply(mirai, x, myfun)  455.7531  472.9182  502.3617  482.0346  490.8194
#>        max neval cld
#>   2511.124   100  a 
#>  10355.358   100   b
#>   1986.198   100  a
ggplot2::autoplot(res) + ggplot2::theme_minimal()

^{Created on 2025-11-06 with reprex v2.1.1}

Nov 06 '25 15:11 philchalmers

Interesting!

Nov 06 '25 15:11 emstruong