segfault in R3.4.3 on Mac
@JackStat FYI Running caret with metric "ROC" consistently produces a segfault on macOS (sierra) with R3.4.3. The segfault message is shown below. Example code also shown below. I have been able to reproduce on multiple different datasets and estimation algorithms. The segfault message refers to ModelMetrics_auc. The code example works fine if I provide my own function to calculate auc.
I hope you'll be able to take a look at this. Please let me know if you need more information. Thanks
cc-ing @topepo
*** caught segfault ***
address 0x18, cause 'memory not mapped'
Traceback:
1: .Call("ModelMetrics_auc_", PACKAGE = "ModelMetrics", actual, predicted, ranks)
2: auc_(actual, predicted, ranks)
3: ModelMetrics::auc(ifelse(data$obs == lev[2], 0, 1), data[, lvls[1]])
4: ctrl$summaryFunction(testOutput, lev, method)
5: evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels, metric = metric, method = method)
6: train.default(x, y, weights = w, ...)
7: train(x, y, weights = w, ...)
8: train.formula(vs ~ ., data = dat, method = "ranger", trControl = ctrl, tuneGrid = grid, metric = "ROC", verbose = FALSE)
9: train(vs ~ ., data = dat, method = "ranger", trControl = ctrl, tuneGrid = grid, metric = "ROC", verbose = FALSE)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
Selection:
Selection:
## loading libraries
library(ranger)
library(caret)
dat <- mtcars
dat$vs <- factor(ifelse(dat$vs == 1, "yes", "no"))
sapply(dat, class)
ranger(
vs ~ .,
data = dat,
probability = TRUE,
num.trees = 50,
mtry = 3
)
set.seed(1234)
grid <- expand.grid(mtry = 3:4, splitrule = "gini", min.node.size = 1)
ctrl <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary,
verboseIter = TRUE
)
result <- train(
vs ~ .,
data = dat,
method = "ranger",
trControl = ctrl,
tuneGrid = grid,
metric = "ROC",
verbose = FALSE
)
Session Info:
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] caret_6.0-78 ggplot2_2.2.1 lattice_0.20-35 ranger_0.9.0
loaded via a namespace (and not attached):
[1] tidyselect_0.2.3 purrr_0.2.4 reshape2_1.4.3 kernlab_0.9-25 splines_3.4.3 colorspace_1.3-2
[7] stats4_3.4.3 yaml_2.1.16 survival_2.41-3 prodlim_1.6.1 rlang_0.2.0.9000 ModelMetrics_1.1.0
[13] pillar_1.1.0 withr_2.1.1 foreign_0.8-69 glue_1.2.0 bindrcpp_0.2 foreach_1.4.3
[19] bindr_0.1.0.9000 plyr_1.8.4 dimRed_0.1.0 lava_1.6 robustbase_0.92-8 stringr_1.3.0
[25] timeDate_3042.101 munsell_0.4.3 gtable_0.2.0 recipes_0.1.2 codetools_0.2-15 psych_1.7.8
[31] parallel_3.4.3 class_7.3-14 DEoptimR_1.0-8 broom_0.4.3 Rcpp_0.12.15 scales_0.5.0
[37] ipred_0.9-6 CVST_0.2-1 mnormt_1.5-5 stringi_1.1.6 dplyr_0.7.4 RcppRoll_0.2.2
[43] ddalpha_1.3.1.1 grid_3.4.3 tools_3.4.3 magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2
[49] tidyr_0.8.0 DRR_0.0.3 pkgconfig_2.0.1 MASS_7.3-48 Matrix_1.2-12 lubridate_1.7.2
[55] gower_0.1.2 assertthat_0.2.0 iterators_1.0.8 R6_2.2.2 rpart_4.1-12 sfsmisc_1.1-0
[61] nnet_7.3-12 nlme_3.1-131 compiler_3.4.3
Strange. I was not able to reproduce the error on high Sierra.
> ## loading libraries
> library(ranger)
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
>
> dat <- mtcars
> dat$vs <- factor(ifelse(dat$vs == 1, "yes", "no"))
> sapply(dat, class)
mpg cyl disp hp drat wt qsec vs am
"numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "factor" "numeric"
gear carb
"numeric" "numeric"
>
> ranger(
+ vs ~ .,
+ data = dat,
+ probability = TRUE,
+ num.trees = 50,
+ mtry = 3
+ )
Ranger result
Call:
ranger(vs ~ ., data = dat, probability = TRUE, num.trees = 50, mtry = 3)
Type: Probability estimation
Number of trees: 50
Sample size: 32
Number of independent variables: 10
Mtry: 3
Target node size: 10
Variable importance mode: none
OOB prediction error: 0.07048678
>
> set.seed(1234)
> grid <- expand.grid(mtry = 3:4, splitrule = "gini", min.node.size = 1)
> ctrl <- trainControl(
+ method = "cv",
+ number = 5,
+ classProbs = TRUE,
+ summaryFunction = twoClassSummary,
+ verboseIter = TRUE
+ )
>
> result <- train(
+ vs ~ .,
+ data = dat,
+ method = "ranger",
+ trControl = ctrl,
+ tuneGrid = grid,
+ metric = "ROC",
+ verbose = FALSE
+ )
+ Fold1: mtry=3, splitrule=gini, min.node.size=1
- Fold1: mtry=3, splitrule=gini, min.node.size=1
+ Fold1: mtry=4, splitrule=gini, min.node.size=1
- Fold1: mtry=4, splitrule=gini, min.node.size=1
+ Fold2: mtry=3, splitrule=gini, min.node.size=1
- Fold2: mtry=3, splitrule=gini, min.node.size=1
+ Fold2: mtry=4, splitrule=gini, min.node.size=1
- Fold2: mtry=4, splitrule=gini, min.node.size=1
+ Fold3: mtry=3, splitrule=gini, min.node.size=1
- Fold3: mtry=3, splitrule=gini, min.node.size=1
+ Fold3: mtry=4, splitrule=gini, min.node.size=1
- Fold3: mtry=4, splitrule=gini, min.node.size=1
+ Fold4: mtry=3, splitrule=gini, min.node.size=1
- Fold4: mtry=3, splitrule=gini, min.node.size=1
+ Fold4: mtry=4, splitrule=gini, min.node.size=1
- Fold4: mtry=4, splitrule=gini, min.node.size=1
+ Fold5: mtry=3, splitrule=gini, min.node.size=1
- Fold5: mtry=3, splitrule=gini, min.node.size=1
+ Fold5: mtry=4, splitrule=gini, min.node.size=1
- Fold5: mtry=4, splitrule=gini, min.node.size=1
Aggregating results
Selecting tuning parameters
Fitting mtry = 3, splitrule = gini, min.node.size = 1 on full training set
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] caret_6.0-78 ggplot2_2.2.1 lattice_0.20-35 ranger_0.9.0
loaded via a namespace (and not attached):
[1] tidyselect_0.2.4 purrr_0.2.4 reshape2_1.4.3 kernlab_0.9-25
[5] splines_3.4.3 colorspace_1.3-2 stats4_3.4.3 yaml_2.1.17
[9] survival_2.41-3 prodlim_1.6.1 rlang_0.2.0 e1071_1.6-8
[13] ModelMetrics_1.1.9001 pillar_1.2.1 withr_2.1.1 foreign_0.8-69
[17] glue_1.2.0 bindrcpp_0.2 foreach_1.4.4 bindr_0.1
[21] plyr_1.8.4 dimRed_0.1.0 lava_1.6 robustbase_0.92-8
[25] stringr_1.3.0 timeDate_3043.102 munsell_0.4.3 gtable_0.2.0
[29] recipes_0.1.2 codetools_0.2-15 psych_1.7.8 parallel_3.4.3
[33] class_7.3-14 DEoptimR_1.0-8 broom_0.4.3 Rcpp_0.12.15
[37] scales_0.5.0 ipred_0.9-6 CVST_0.2-1 mnormt_1.5-5
[41] stringi_1.1.6 dplyr_0.7.4 RcppRoll_0.2.2 ddalpha_1.3.1.1
[45] grid_3.4.3 tools_3.4.3 magrittr_1.5 lazyeval_0.2.1
[49] tibble_1.4.2 tidyr_0.8.0 DRR_0.0.3 pkgconfig_2.0.1
[53] MASS_7.3-47 Matrix_1.2-12 data.table_1.10.4-3 lubridate_1.7.3
[57] gower_0.1.2 assertthat_0.2.0 iterators_1.0.9 R6_2.2.2
[61] rpart_4.1-11 sfsmisc_1.1-2 nnet_7.3-12 nlme_3.1-131
[65] compiler_3.4.3
Are you seeing the issue on any other operating system? and can you try cloning the package and building it. I am curious if you can build it without any issues
Thanks for checking @JackStat! From your sessionInfo() it looks like your version of ModelMetrics is not ModelMetrics_1.1.0 from CRAN however. I tried installing the version you are using from GitHub but got the following error:
auc_.cpp:2:10: fatal error: 'omp.h' file not found
#include <omp.h>
^~~~~~~
1 error generated.
make: *** [auc_.o] Error 1
ERROR: compilation failed for package ‘ModelMetrics’
* removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/ModelMetrics’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/ModelMetrics’
Installation failed: Command failed (1)
Ok this likely sounds scarier than it is. Can you follow this guide and try again?
https://thecoatlessprofessor.com/programming/openmp-in-r-on-os-x/#after-3-4-0
I installed clang4.0.0 and was able to build the package from github and run the code successfully. I already had gfortran 6.3 installed. Interestingly, ModelMetrics 1.1.0 from CRAN now also works.
Is it feasible to use ModelMetrics (and Caret) without installing clang and gfortran? Thanks!
I will do some research on this and see if there is a viable alternative to OpenMP. Posix threading is standard in Cpp 11 but I need some time to figure that out
Thanks @JackStat. Would it be possible to have ModelMetrics revert to alternative calculation methods (e.g., regular R code) if OpenMP is not available?
Maybe obvious- but others may also find themselves here just needing to install OMP (brew install libomp)