`mirai` over `future`?
Hi Phil,
I was wondering if it might be a good idea to try to follow what purrr is doing and introduce a direct mirai parallelism backend to SimDesign over using future?
https://tidyverse.org/blog/2025/07/purrr-1-1-0-parallel/#production-ready-with-mirai
Wouldn't this already be supported via future.miari? https://future.mirai.futureverse.org/
Of course, a purrr::map() could be put in the main function distributor as it's basically just a lapply() logic, but I wonder if this is worth the effort. What's the performance benefit of doing this over and above what's currently supported using parallel or future?
I think I communicated my idea poorly, based on that original link, my understanding is that purrr has found a way to bypass future and to use mirai directly.
And what they're claiming is that using mirai directly is better than having any dependency on future whatsoever. As they wrote,
Compared to the furrr package:
- Much lower overhead means you can get a performance boost even for relatively fast functions
- More linear scaling means you get the same benefits whether you’re running on 2 or 200 cores
So what I'm wondering is if SimDesign might benefit from copying purrr use of mirai, but not necessarily use purrr itself.
You can use mirai in a number of ways already, not just in it's canonical form. For instance, if you define a cluster object using miria::make_cluster() and pass that to the cl argument then you get the benefits of miria without changing anything else in the package as this is supported in functions like parLapply().
From what I've gathered miria seems to work really well when requesting parallel processing when the jobs are short, so the efficiency in the distribution of the jobs can see large benefits as the microseconds add up (particularly important for interactive applets like shiny). For larger jobs, like in a simulation replicate where each execution time isn't exactly trivial, it's unlikely you'll see particularly large improvements. Nevertheless, it's tempting to switch the automatically generated cluster currently used from parallel to the one available from mirai, as then parallel = TRUE could be the default as you won't see as much of a performance hit when the parallel processing is generally unnecessary.
I'll have to think about whether that's worth the complexity as I generally like leaving parallel processing off until it's run time. And of course, further unit testing would be required as there's chances this could break something, or show so little improvements overall that the effect is trivial.
Here's a quick overview of the mirai approach vs parallel with simple simulations. As I suspected the difference in simulations that take a non-trivial amount of time (the majority of them) there's barely any difference between the approaches. Nevertheless, some very small differences are detectable, and across full simulation experiments would add up to a few seconds or minutes saved, so in that sense I wouldn't mind adding a flag to the package to switch between the mirai vs parallel cluster definitions. See below.
library(parallel)
library(mirai)
base <- parallel::makeCluster(4)
mirai <- mirai::make_cluster(4)
#####################
# test with simple sampling from rpois() with n=1 and n=100
system.time(rpois(n=1, lambda=1))
#> user system elapsed
#> 0 0 0
# N = 1 per sample
x <- rep(1, 1000)
res <- microbenchmark::microbenchmark(
parLapply(base, x, rpois, lambda=1),
lapply(x, rpois, lambda = 1),
parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> ℹ Please use tidy evaluation idioms with `aes()`.
#> ℹ See also `vignette("ggplot2-in-packages")` for more information.
#> ℹ The deprecated feature was likely used in the microbenchmark package.
#> Please report the issue at
#> <https://github.com/joshuaulrich/microbenchmark/issues/>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

x <- rep(100, 1000)
system.time(rpois(n=100, lambda=1))
#> user system elapsed
#> 0 0 0
res <- microbenchmark::microbenchmark(
parLapply(base, x, rpois, lambda=1),
lapply(x, rpois, lambda=1),
parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()

#############
# test with a simple simulation experiment:
# - one factor FA, sample data from MVN, fit model
# - Use N=100 and N=1000
myfun <- function(n){
F <- matrix(rep(.6, 10))
Psi <- diag(1 - rowSums(F^2))
R <- F %*% t(F) + Psi
dat <- SimDesign::rmvnorm(n, sigma=R)
fa <- psych::fa(dat, 1)
fa
}
invisible(myfun(100))
system.time(myfun(100))
#> user system elapsed
#> 0.016 0.000 0.016
system.time(myfun(1000))
#> user system elapsed
#> 0.012 0.000 0.012
## now with a simulation experiment
x <- rep(100, 1000)
res <- microbenchmark::microbenchmark(
parLapply(base, x, myfun),
lapply(x, myfun),
parLapply(mirai, x, myfun)
)
res
#> Unit: seconds
#> expr min lq mean median uq
#> parLapply(base, x, myfun) 2.443161 2.541470 2.589598 2.570586 2.607769
#> lapply(x, myfun) 9.473621 9.603256 9.683442 9.681214 9.756624
#> parLapply(mirai, x, myfun) 2.385631 2.455194 2.522000 2.498668 2.557127
#> max neval cld
#> 3.616511 100 a
#> 10.072631 100 b
#> 3.422199 100 c
ggplot2::autoplot(res) + ggplot2::theme_minimal()

# is this actually faster?
sub <- subset(res, grepl('parLapply', res$expr))
t.test(time ~ expr, sub)
#>
#> Welch Two Sample t-test
#>
#> data: time by expr
#> t = 3.8138, df = 197.83, p-value = 0.0001828
#> alternative hypothesis: true difference in means between group parLapply(base, x, myfun) and group parLapply(mirai, x, myfun) is not equal to 0
#> 95 percent confidence interval:
#> 32644271 102550447
#> sample estimates:
#> mean in group parLapply(base, x, myfun)
#> 2589597586
#> mean in group parLapply(mirai, x, myfun)
#> 2522000227
# with N = 1000
x <- rep(1000, 1000)
res2 <- microbenchmark::microbenchmark(
parLapply(base, x, myfun),
lapply(x, myfun),
parLapply(mirai, x, myfun)
)
res2
#> Unit: seconds
#> expr min lq mean median uq
#> parLapply(base, x, myfun) 2.987846 3.077223 3.133002 3.115353 3.162935
#> lapply(x, myfun) 11.377627 11.601821 11.767714 11.737279 11.928104
#> parLapply(mirai, x, myfun) 2.946469 2.992987 3.037530 3.015425 3.076472
#> max neval cld
#> 3.459101 100 a
#> 12.594099 100 b
#> 3.370846 100 c
ggplot2::autoplot(res2) + ggplot2::theme_minimal()

# is this actually faster?
sub <- subset(res2, grepl('parLapply', res2$expr))
t.test(time ~ expr, sub)
#>
#> Welch Two Sample t-test
#>
#> data: time by expr
#> t = 8.1662, df = 191.89, p-value = 4.164e-14
#> alternative hypothesis: true difference in means between group parLapply(base, x, myfun) and group parLapply(mirai, x, myfun) is not equal to 0
#> 95 percent confidence interval:
#> 72411939 118530970
#> sample estimates:
#> mean in group parLapply(base, x, myfun)
#> 3133001835
#> mean in group parLapply(mirai, x, myfun)
#> 3037530381
stopCluster(base)
stop_cluster(mirai)
#############
# redefine using everything available
nc <- parallel::detectCores() - 1
base <- parallel::makeCluster(nc)
mirai <- mirai::make_cluster(nc)
# N = 1 per sample
x <- rep(1, 1000)
res <- microbenchmark::microbenchmark(
parLapply(base, x, rpois, lambda=1),
lapply(x, rpois, lambda = 1),
parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()

x <- rep(100, 1000)
system.time(rpois(n=100, lambda=1))
#> user system elapsed
#> 0 0 0
res <- microbenchmark::microbenchmark(
parLapply(base, x, rpois, lambda=1),
lapply(x, rpois, lambda=1),
parLapply(mirai, x, rpois, lambda=1)
)
ggplot2::autoplot(res) + ggplot2::theme_minimal()

## now with a simulation experiment
x <- rep(100, 1000)
res <- microbenchmark::microbenchmark(
parLapply(base, x, myfun),
lapply(x, myfun),
parLapply(mirai, x, myfun)
)
res
#> Unit: milliseconds
#> expr min lq mean median uq
#> parLapply(base, x, myfun) 491.9251 522.1341 566.6943 539.1654 553.2034
#> lapply(x, myfun) 9260.3089 9472.1508 9637.1479 9631.9655 9752.0433
#> parLapply(mirai, x, myfun) 455.7531 472.9182 502.3617 482.0346 490.8194
#> max neval cld
#> 2511.124 100 a
#> 10355.358 100 b
#> 1986.198 100 a
ggplot2::autoplot(res) + ggplot2::theme_minimal()

Created on 2025-11-06 with reprex v2.1.1
Interesting!