R-session crashes after successful sampling of a model with many parameters
I submitted this as a question on the Stan forums, but I've been able to re-create it on a few different Windows machines since then, so I think it truly is a bug, and so here's what I hope is a useful Bug report.
The bug
cmdstanr crashes the R-session after successfully sampling from a model with many parameters. I think it's something to do with how cmdstanr summarises or assesses the sampling. A clean R-session will also crash when trying to read the stored csv files using cmdstanr::read_cmdstan_csv() or cmdstanr::as_cmdstan_fit(). However, the same stored csv files can be read successfully with rstan rstan::read_stan_csv(), and so I'm confident that the model-fitting was successful.
This has come up in a project working with a large database of bird observations from the last 56 years: The North American Breeding Bird Survey. The models from that project work fine for bird species with ~50-60K observations, but this R-crash occurs for the more data-rich species with ~100K observations (which result in ~250K parameters). I'd like to be able to apply my model to all of the species in the database, and to stick with cmdstanr for my entire workflow, and of course I'd also like it if the R-session didn't crash after fitting a model.
Reproducible Example
Here's a simple reproducible example, that suggests there's something about the number of parameters that causes the crash. Simple linear regression model, with 250K data.
library(cmdstanr)
N = 250000
x = rnorm(N)
y = 1.5*x+rnorm(N,0,0.3)
stan_data <- list(N = N,
y = y,
x = x)
model_code <- "
data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}
parameters {
real a;
real b;
real<lower=0> sigma;
}
model {
sigma ~ student_t(3,0,1);
b ~ std_normal();
a ~ std_normal();
y ~ normal(a+b*x,sigma);
}
generated quantities {
vector[N] log_lik;
for(i in 1:N){
log_lik[i] = normal_lpdf(y[i] | a+b*x[i], sigma);
}
}
"
mod <- "simple_regression.stan"
cat(model_code,file = mod)
model <- cmdstan_model(mod)
Crashes after fitting
This call to model$sample crashes the R-session after sampling is complete. The csv output files are stored. It takes ~10 minutes to sample, write the files, then with no errors or warnings, the R-session crashes. The crash happens in a stand-alone R-session and/or RStudio.
output <- getwd()
stanfit <- model$sample(
data=stan_data,
refresh=200,
chains=4,
iter_sampling=1000,
iter_warmup=1000,
parallel_chains = 4,
output_dir = output,
output_basename = "simple_regression_fit")
The csv files can be read with rstan
This rstan::read_stan_csv call works, although it takes a long time to read in the files.
csv_files <- paste0("simple_regression_fit-",1:4,".csv")
stanfit <- rstan::read_stan_csv(csv_files, col_major = TRUE) ## successful reading of csv files with rstan
But trying to read or load the files with cmdstanr causes R-crash
Trying to read in the csv files with cmdstanr cause the R-session to crash. The crash happens quickly (a few seconds), there is no indication from the operating system of a memory issue or any other issue, and no other indication of an error. The session crashes both within a stand-alone R-session, and in RStudio.
### this as_cmdstan_fit call crashes the R-session
stanfit <- as_cmdstan_fit(files = csv_files)
### similarly, this read_cmdstan_csv call crashes the R-session
stanfit <- read_cmdstan_csv(
files = csv_files,
variables = "",
sampler_diagnostics = NULL,
format = "draws_list") # following note about efficiency in ?cmdstanr::draws
Session info
Running on a Windows computer with 16 cores and 128GB of RAM (so it's not a question of memory, I don't think)
utils::sessionInfo()
R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] rstan_2.21.5 ggplot2_3.3.6 StanHeaders_2.21.0-7 cmdstanr_0.5.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 pillar_1.7.0 compiler_4.2.0 prettyunits_1.1.1 tools_4.2.0 pkgbuild_1.3.1 jsonlite_1.8.0 lifecycle_1.0.1
[9] tibble_3.1.7 gtable_0.3.0 checkmate_2.1.0 pkgconfig_2.0.3 rlang_1.0.2 cli_3.3.0 DBI_1.1.3 parallel_4.2.0
[17] xfun_0.31 loo_2.5.1 gridExtra_2.3 withr_2.5.0 dplyr_1.0.9 knitr_1.39 generics_0.1.2 vctrs_0.4.1
[25] stats4_4.2.0 grid_4.2.0 tidyselect_1.1.2 inline_0.3.19 glue_1.6.2 R6_2.5.1 processx_3.6.1 fansi_1.0.3
[33] distributional_0.3.0 tensorA_0.36.2 callr_3.7.0 farver_2.1.0 purrr_0.3.4 posterior_1.2.2 magrittr_2.0.3 codetools_0.2-18
[41] matrixStats_0.62.0 ps_1.7.1 backports_1.4.1 scales_1.2.0 ellipsis_0.3.2 abind_1.4-5 assertthat_0.2.1 colorspace_2.0-3
[49] utf8_1.2.2 RcppParallel_5.1.5 munsell_0.5.0 crayon_1.5.1
Thanks for the report. Fixing this might require some more assistance from your side as I was unable to reproduce the crash on my machines. I tried a Mac and Windows machine.
After running:
stanfit <- model$sample(
data=stan_data,
refresh=200,
chains=4,
iter_sampling=1000,
iter_warmup=1000,
parallel_chains = 4,
output_dir = output,
output_basename = "simple_regression_fit",
diagnostics = NULL
)
Note the added diagnostics = NULL above. This prevents reading in the divergences, treedepth and EBFMI columns from the CSV after sampling finishes.
Assuming the above completes fine, please run the following afterwards:
res <- list()
i <- 1
for (output_file in stanfit$output_files()) {
fread_cmd <- paste0("grep -v '^#' --color=never '", output_file, "'")
res[[i]] <- data.table::fread(
cmd = fread_cmd,
data.table = FALSE
)
i <- i + 1
}
Thanks for your help.
The sample statement with diagnostics = NULL still causes the crash. Same behaviour as the other sample calls, where the sampling seems to finish fine, but then the R-session crashes with no errors or warnings.
After the crash. I started a new R-session in the same working directory and ran the following.
csv_files <- paste0("simple_regression_fit-",1:4,".csv")
res <- list()
i <- 1
for (output_file in csv_files) {
fread_cmd <- paste0("grep -v '^#' --color=never '", output_file, "'")
res[[i]] <- data.table::fread(
cmd = fread_cmd,
data.table = FALSE
)
i <- i + 1
}
That loop runs fine. The result is a large list (~8GB), with 4 elements and each element of the list is a data frame (see below). I'll see if I can find another Windows machine that doesn't cause the error... So weird.
str(res[[1]])
# $ :'data.frame': 1000 obs. of 250010 variables:
# ..$ lp__ : int [1:1000] 175564 175565 175565 175565 175563 175564 175562 175564 175565 175565 ...
# ..$ accept_stat__ : num [1:1000] 0.934 0.988 0.99 0.597 0.72 ...
# ..$ stepsize__ : num [1:1000] 0.178 0.178 0.178 0.178 0.178 ...
# ..$ treedepth__ : int [1:1000] 3 2 2 2 2 2 2 3 2 2 ...
# ..$ n_leapfrog__ : int [1:1000] 7 3 7 3 3 7 3 7 3 3 ...
# ..$ divergent__ : int [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
# ..$ energy__ : int [1:1000] -175564 -175564 -175564 -175561 -175562 -175562 -175561 -175560 -175564 -175564 ...
# ..$ a : num [1:1000] -8.15e-04 -3.56e-04 -6.82e-05 -3.28e-04 -4.23e-04 ...
# ..$ b : num [1:1000] 1.5 1.5 1.5 1.5 1.5 ...
# ..$ sigma : num [1:1000] 0.301 0.301 0.3 0.3 0.3 ...
# ..$ log_lik.1 : num [1:1000] -0.0705 -0.0707 -0.0717 -0.0708 -0.0663 ...
Good afternoon, just wanted to chime in here to say that I can replicate the original crash on my computer. This is after freshly installing cmdstan this morning. R Studio crashes after sampling is complete for me. Here is the sessionInfo up to before sampling:
R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] cmdstanr_0.5.2
loaded via a namespace (and not attached):
[1] pillar_1.6.4 compiler_4.0.3 tools_4.0.3 jsonlite_1.7.3
[5] lifecycle_1.0.1 tibble_3.1.6 gtable_0.3.0 checkmate_2.0.0
[9] pkgconfig_2.0.3 rlang_0.4.12 DBI_1.1.0 xfun_0.29
[13] withr_2.5.0 dplyr_1.0.7 knitr_1.37 generics_0.1.1
[17] vctrs_0.3.8 grid_4.0.3 tidyselect_1.1.1 glue_1.6.1
[21] R6_2.5.1 processx_3.7.0 fansi_1.0.2 distributional_0.3.0
[25] tensorA_0.36.2 ggplot2_3.3.5 farver_2.1.0 purrr_0.3.4
[29] posterior_1.2.1 blob_1.2.1 magrittr_2.0.1 backports_1.2.0
[33] scales_1.1.1 ps_1.4.0 ellipsis_0.3.2 abind_1.4-5
[37] assertthat_0.2.1 colorspace_2.0-2 utf8_1.2.2 munsell_0.5.0
[41] crayon_1.4.2
@gravesti did you try this already? I cannot try it yet because installation fails for me on Windows
@danielinteractive It runs fine for me with this set up. I'll try again with other versions similar to those above.
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_Switzerland.utf8 LC_CTYPE=English_Switzerland.utf8 LC_MONETARY=English_Switzerland.utf8
[4] LC_NUMERIC=C LC_TIME=English_Switzerland.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cmdstanr_0.5.3
loaded via a namespace (and not attached):
[1] magrittr_2.0.3 munsell_0.5.0 colorspace_2.0-3 R6_2.5.1 rlang_1.0.6
[6] fansi_1.0.3 tools_4.2.2 grid_4.2.2 data.table_1.14.4 checkmate_2.1.0
[11] gtable_0.3.1 utf8_1.2.2 cli_3.4.1 posterior_1.3.1 withr_2.5.0
[16] matrixStats_0.62.0 abind_1.4-5 tibble_3.1.8 lifecycle_1.0.3 processx_3.8.0
[21] tensorA_0.36.2 farver_2.1.1 ggplot2_3.3.6 ps_1.7.2 vctrs_0.5.0
[26] glue_1.6.2 compiler_4.2.2 pillar_1.8.1 generics_0.1.3 scales_1.2.1
[31] backports_1.4.1 distributional_0.3.1 jsonlite_1.8.3 pkgconfig_2.0.3
@rok-cesnovar could it be that this was fixed with newer R versions?
Maybe yeah. Given that a few of us have tried to replicate this issue and were unable to, let's close this. Thanks!