cmdstanr icon indicating copy to clipboard operation
cmdstanr copied to clipboard

R-session crashes after successful sampling of a model with many parameters

Open AdamCSmithCWS opened this issue 3 years ago • 3 comments

I submitted this as a question on the Stan forums, but I've been able to re-create it on a few different Windows machines since then, so I think it truly is a bug, and so here's what I hope is a useful Bug report.

The bug

cmdstanr crashes the R-session after successfully sampling from a model with many parameters. I think it's something to do with how cmdstanr summarises or assesses the sampling. A clean R-session will also crash when trying to read the stored csv files using cmdstanr::read_cmdstan_csv() or cmdstanr::as_cmdstan_fit(). However, the same stored csv files can be read successfully with rstan rstan::read_stan_csv(), and so I'm confident that the model-fitting was successful. This has come up in a project working with a large database of bird observations from the last 56 years: The North American Breeding Bird Survey. The models from that project work fine for bird species with ~50-60K observations, but this R-crash occurs for the more data-rich species with ~100K observations (which result in ~250K parameters). I'd like to be able to apply my model to all of the species in the database, and to stick with cmdstanr for my entire workflow, and of course I'd also like it if the R-session didn't crash after fitting a model.

Reproducible Example

Here's a simple reproducible example, that suggests there's something about the number of parameters that causes the crash. Simple linear regression model, with 250K data.

library(cmdstanr)

N = 250000

x = rnorm(N)

y = 1.5*x+rnorm(N,0,0.3)

stan_data <- list(N = N,
                  y = y,
                  x = x)

model_code <- "
data {
 int<lower=1> N;
  vector[N] x;             
  vector[N] y;               
}

parameters {
  real a; 
  real b;
  real<lower=0> sigma;    
}

model {
  sigma ~ student_t(3,0,1); 
  b ~ std_normal();
  a ~ std_normal();

 y ~ normal(a+b*x,sigma); 

}

generated quantities {

   vector[N] log_lik; 

  for(i in 1:N){
   log_lik[i] = normal_lpdf(y[i] | a+b*x[i], sigma);
   }
 
  }

"

mod <- "simple_regression.stan"

cat(model_code,file = mod)

model <- cmdstan_model(mod)

Crashes after fitting

This call to model$sample crashes the R-session after sampling is complete. The csv output files are stored. It takes ~10 minutes to sample, write the files, then with no errors or warnings, the R-session crashes. The crash happens in a stand-alone R-session and/or RStudio.


output <- getwd()

stanfit <- model$sample(
  data=stan_data,
  refresh=200,
  chains=4, 
  iter_sampling=1000,
  iter_warmup=1000,
  parallel_chains = 4,
  output_dir = output,
  output_basename = "simple_regression_fit")


The csv files can be read with rstan

This rstan::read_stan_csv call works, although it takes a long time to read in the files.

csv_files <- paste0("simple_regression_fit-",1:4,".csv")
stanfit <- rstan::read_stan_csv(csv_files, col_major = TRUE) ## successful reading of csv files with rstan

But trying to read or load the files with cmdstanr causes R-crash

Trying to read in the csv files with cmdstanr cause the R-session to crash. The crash happens quickly (a few seconds), there is no indication from the operating system of a memory issue or any other issue, and no other indication of an error. The session crashes both within a stand-alone R-session, and in RStudio.

### this as_cmdstan_fit call crashes the R-session
stanfit <- as_cmdstan_fit(files = csv_files)

### similarly, this read_cmdstan_csv call crashes the R-session
stanfit <- read_cmdstan_csv(
 files = csv_files,
 variables = "",
 sampler_diagnostics = NULL,
 format = "draws_list") # following note about efficiency in ?cmdstanr::draws


Session info

Running on a Windows computer with 16 cores and 128GB of RAM (so it's not a question of memory, I don't think)


utils::sessionInfo()

R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rstan_2.21.5 ggplot2_3.3.6 StanHeaders_2.21.0-7 cmdstanr_0.5.2

loaded via a namespace (and not attached): [1] Rcpp_1.0.8.3 pillar_1.7.0 compiler_4.2.0 prettyunits_1.1.1 tools_4.2.0 pkgbuild_1.3.1 jsonlite_1.8.0 lifecycle_1.0.1
[9] tibble_3.1.7 gtable_0.3.0 checkmate_2.1.0 pkgconfig_2.0.3 rlang_1.0.2 cli_3.3.0 DBI_1.1.3 parallel_4.2.0
[17] xfun_0.31 loo_2.5.1 gridExtra_2.3 withr_2.5.0 dplyr_1.0.9 knitr_1.39 generics_0.1.2 vctrs_0.4.1
[25] stats4_4.2.0 grid_4.2.0 tidyselect_1.1.2 inline_0.3.19 glue_1.6.2 R6_2.5.1 processx_3.6.1 fansi_1.0.3
[33] distributional_0.3.0 tensorA_0.36.2 callr_3.7.0 farver_2.1.0 purrr_0.3.4 posterior_1.2.2 magrittr_2.0.3 codetools_0.2-18
[41] matrixStats_0.62.0 ps_1.7.1 backports_1.4.1 scales_1.2.0 ellipsis_0.3.2 abind_1.4-5 assertthat_0.2.1 colorspace_2.0-3
[49] utf8_1.2.2 RcppParallel_5.1.5 munsell_0.5.0 crayon_1.5.1

AdamCSmithCWS avatar Jul 15 '22 19:07 AdamCSmithCWS

Thanks for the report. Fixing this might require some more assistance from your side as I was unable to reproduce the crash on my machines. I tried a Mac and Windows machine.

After running:

stanfit <- model$sample(
  data=stan_data,
  refresh=200,
  chains=4, 
  iter_sampling=1000,
  iter_warmup=1000,
  parallel_chains = 4,
  output_dir = output,
  output_basename = "simple_regression_fit",
  diagnostics = NULL
)

Note the added diagnostics = NULL above. This prevents reading in the divergences, treedepth and EBFMI columns from the CSV after sampling finishes.

Assuming the above completes fine, please run the following afterwards:

res <- list()
i <- 1
for (output_file in stanfit$output_files()) {
  fread_cmd <- paste0("grep -v '^#' --color=never '", output_file, "'")
  res[[i]] <- data.table::fread(
    cmd = fread_cmd,
    data.table = FALSE
  )
  i <- i + 1
}

rok-cesnovar avatar Jul 17 '22 09:07 rok-cesnovar

Thanks for your help. The sample statement with diagnostics = NULL still causes the crash. Same behaviour as the other sample calls, where the sampling seems to finish fine, but then the R-session crashes with no errors or warnings.

After the crash. I started a new R-session in the same working directory and ran the following.

csv_files <- paste0("simple_regression_fit-",1:4,".csv")

res <- list()
i <- 1
for (output_file in csv_files) {
  fread_cmd <- paste0("grep -v '^#' --color=never '", output_file, "'")
  res[[i]] <- data.table::fread(
    cmd = fread_cmd,
    data.table = FALSE
  )
  i <- i + 1
}

That loop runs fine. The result is a large list (~8GB), with 4 elements and each element of the list is a data frame (see below). I'll see if I can find another Windows machine that doesn't cause the error... So weird.

str(res[[1]])
# $ :'data.frame':	1000 obs. of  250010 variables:
# ..$ lp__          : int [1:1000] 175564 175565 175565 175565 175563 175564 175562 175564 175565 175565 ...
# ..$ accept_stat__ : num [1:1000] 0.934 0.988 0.99 0.597 0.72 ...
#  ..$ stepsize__    : num [1:1000] 0.178 0.178 0.178 0.178 0.178 ...
#  ..$ treedepth__   : int [1:1000] 3 2 2 2 2 2 2 3 2 2 ...
#  ..$ n_leapfrog__  : int [1:1000] 7 3 7 3 3 7 3 7 3 3 ...
#  ..$ divergent__   : int [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
#  ..$ energy__      : int [1:1000] -175564 -175564 -175564 -175561 -175562 -175562 -175561 -175560 -175564 -175564 ...
#  ..$ a             : num [1:1000] -8.15e-04 -3.56e-04 -6.82e-05 -3.28e-04 -4.23e-04 ...
#  ..$ b             : num [1:1000] 1.5 1.5 1.5 1.5 1.5 ...
#  ..$ sigma         : num [1:1000] 0.301 0.301 0.3 0.3 0.3 ...
#  ..$ log_lik.1     : num [1:1000] -0.0705 -0.0707 -0.0717 -0.0708 -0.0663 ...

AdamCSmithCWS avatar Jul 18 '22 13:07 AdamCSmithCWS

Good afternoon, just wanted to chime in here to say that I can replicate the original crash on my computer. This is after freshly installing cmdstan this morning. R Studio crashes after sampling is complete for me. Here is the sessionInfo up to before sampling:

R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] cmdstanr_0.5.2

loaded via a namespace (and not attached): [1] pillar_1.6.4 compiler_4.0.3 tools_4.0.3 jsonlite_1.7.3
[5] lifecycle_1.0.1 tibble_3.1.6 gtable_0.3.0 checkmate_2.0.0
[9] pkgconfig_2.0.3 rlang_0.4.12 DBI_1.1.0 xfun_0.29
[13] withr_2.5.0 dplyr_1.0.7 knitr_1.37 generics_0.1.1
[17] vctrs_0.3.8 grid_4.0.3 tidyselect_1.1.1 glue_1.6.1
[21] R6_2.5.1 processx_3.7.0 fansi_1.0.2 distributional_0.3.0 [25] tensorA_0.36.2 ggplot2_3.3.5 farver_2.1.0 purrr_0.3.4
[29] posterior_1.2.1 blob_1.2.1 magrittr_2.0.1 backports_1.2.0
[33] scales_1.1.1 ps_1.4.0 ellipsis_0.3.2 abind_1.4-5
[37] assertthat_0.2.1 colorspace_2.0-2 utf8_1.2.2 munsell_0.5.0
[41] crayon_1.4.2

BrandonEdwards avatar Jul 20 '22 19:07 BrandonEdwards

@gravesti did you try this already? I cannot try it yet because installation fails for me on Windows

danielinteractive avatar Nov 11 '22 13:11 danielinteractive

@danielinteractive It runs fine for me with this set up. I'll try again with other versions similar to those above.

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Switzerland.utf8  LC_CTYPE=English_Switzerland.utf8    LC_MONETARY=English_Switzerland.utf8
[4] LC_NUMERIC=C                         LC_TIME=English_Switzerland.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cmdstanr_0.5.3

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3       munsell_0.5.0        colorspace_2.0-3     R6_2.5.1             rlang_1.0.6         
 [6] fansi_1.0.3          tools_4.2.2          grid_4.2.2           data.table_1.14.4    checkmate_2.1.0     
[11] gtable_0.3.1         utf8_1.2.2           cli_3.4.1            posterior_1.3.1      withr_2.5.0         
[16] matrixStats_0.62.0   abind_1.4-5          tibble_3.1.8         lifecycle_1.0.3      processx_3.8.0      
[21] tensorA_0.36.2       farver_2.1.1         ggplot2_3.3.6        ps_1.7.2             vctrs_0.5.0         
[26] glue_1.6.2           compiler_4.2.2       pillar_1.8.1         generics_0.1.3       scales_1.2.1        
[31] backports_1.4.1      distributional_0.3.1 jsonlite_1.8.3       pkgconfig_2.0.3     

gravesti avatar Nov 11 '22 19:11 gravesti

@rok-cesnovar could it be that this was fixed with newer R versions?

danielinteractive avatar Nov 11 '22 19:11 danielinteractive

Maybe yeah. Given that a few of us have tried to replicate this issue and were unable to, let's close this. Thanks!

rok-cesnovar avatar Nov 11 '22 19:11 rok-cesnovar