`spatial_clustering_cv` retains geometry in folds causing `fit_resamples` to fail
The problem
When using spatial_clustering_cv to create spatial resamples, the geometry column is retained within the folds. This causes fit_resamples to fail with an error indicating that not all columns of y are known outcome types. It's unclear whether spatial_clustering_cv should drop the spatial information in the folds or if fit_resamples should exclude the geometry information. There might be something I'm missing.
Reproducible example
# Load package
library(dplyr, warn.conflicts = FALSE)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(spatialsample)
library(workflows)
library(parsnip)
library(tune)
# Example data
nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# Making spatial clusters
nc_folds <- spatial_clustering_cv(nc, v = 5)
# Workflow for linear regression
lr_recipe <- workflow() %>%
add_variables(outcomes = BIR74,
predictors = AREA) %>%
add_model(linear_reg(engine = "lm"))
# Tuning parameters: Fail
(spatial_lr <- fit_resamples(lr_recipe, nc_folds))
#> → A | error: Not all columns of `y` are known outcome types. These columns have unknown types: 'geometry'.
#> There were issues with some computations A: x1
#> There were issues with some computations A: x5
#>
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> # Resampling results
#> # 5-fold spatial cross-validation
#> # A tibble: 5 × 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [77/23]> Fold1 <NULL> <tibble [1 × 3]>
#> 2 <split [75/25]> Fold2 <NULL> <tibble [1 × 3]>
#> 3 <split [79/21]> Fold3 <NULL> <tibble [1 × 3]>
#> 4 <split [84/16]> Fold4 <NULL> <tibble [1 × 3]>
#> 5 <split [85/15]> Fold5 <NULL> <tibble [1 × 3]>
#>
#> There were issues with some computations:
#>
#> - Error(s) x5: Not all columns of `y` are known outcome types. These columns hav...
#>
#> Run `show_notes(.Last.tune.result)` for more information.
# Best tuning parameters: : Fail
collect_metrics(spatial_lr)
#> Error in `estimate_tune_results()`:
#> ! All models failed. Run `show_notes(.Last.tune.result)` for more information.
# Try with st_drop_geometry:
orig_class <- class(nc_folds)
nc_folds <- nc_folds %>%
mutate(splits = purrr::map(splits, ~ {
.x$data <- st_drop_geometry(.x$data)
.x
}))
class(nc_folds) <- orig_class
# Tuning parameters
(spatial_lr <- fit_resamples(lr_recipe, nc_folds))
#> # Resampling results
#> # -fold spatial cross-validation
#> # A tibble: 5 × 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [77/23]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>
#> 2 <split [75/25]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>
#> 3 <split [79/21]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>
#> 4 <split [84/16]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>
#> 5 <split [85/15]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]>
# Best tuning parameters
collect_metrics(spatial_lr)
#> # A tibble: 2 × 6
#> .metric .estimator mean n std_err .config
#> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 rmse standard 3542. 5 634. Preprocessor1_Model1
#> 2 rsq standard 0.178 5 0.0616 Preprocessor1_Model1
Created on 2024-07-19 with reprex v2.1.1
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14 ucrt)
#> os Windows 11 x64 (build 22635)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Spanish_Peru.utf8
#> ctype Spanish_Peru.utf8
#> tz America/Lima
#> date 2024-07-19
#> pandoc 3.1.12.3 @ c:\\Program Files\\Positron\\bin\\pandoc/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> class 7.3-22 2023-05-03 [2] CRAN (R 4.4.1)
#> classInt 0.4-10 2023-09-05 [1] CRAN (R 4.4.0)
#> cli 3.6.3.9000 2024-06-28 [1] Github (r-lib/cli@d9febb5)
#> codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.1)
#> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.0)
#> data.table 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [1] CRAN (R 4.4.0)
#> dials 1.2.1 2024-02-22 [1] CRAN (R 4.4.1)
#> DiceDesign 1.10 2023-12-07 [1] CRAN (R 4.4.1)
#> digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.1)
#> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
#> e1071 1.7-14 2023-12-06 [1] CRAN (R 4.4.0)
#> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
#> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
#> furrr 0.3.1 2022-08-15 [1] CRAN (R 4.4.0)
#> future 1.33.2 2024-03-26 [1] CRAN (R 4.4.0)
#> future.apply 1.11.2 2024-03-28 [1] CRAN (R 4.4.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
#> ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.4.1)
#> globals 0.16.3 2024-03-08 [1] CRAN (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
#> gower 1.0.1 2022-12-22 [1] CRAN (R 4.4.0)
#> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.4.1)
#> gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
#> hardhat 1.4.0 2024-06-02 [1] CRAN (R 4.4.1)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#> ipred 0.9-14 2023-03-09 [1] CRAN (R 4.4.1)
#> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.4.0)
#> KernSmooth 2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
#> knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1)
#> lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.1)
#> lava 1.8.0 2024-03-05 [1] CRAN (R 4.4.1)
#> lhs 1.2.0 2024-06-30 [1] CRAN (R 4.4.1)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
#> listenv 0.9.1 2024-01-29 [1] CRAN (R 4.4.0)
#> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
#> MASS 7.3-60.2 2024-04-26 [2] CRAN (R 4.4.1)
#> Matrix 1.7-0 2024-04-26 [2] CRAN (R 4.4.1)
#> munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
#> nnet 7.3-19 2023-05-03 [2] CRAN (R 4.4.1)
#> parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.4.0)
#> parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.4.1)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
#> prodlim 2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
#> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
#> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0)
#> recipes 1.1.0 2024-07-04 [1] CRAN (R 4.4.1)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.1)
#> rlang 1.1.4.9000 2024-06-28 [1] Github (r-lib/rlang@cebbabf)
#> rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0)
#> rpart 4.1.23 2023-12-05 [2] CRAN (R 4.4.1)
#> rsample 1.2.1 2024-03-25 [1] CRAN (R 4.4.1)
#> s2 1.1.6 2023-12-19 [1] CRAN (R 4.4.0)
#> scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
#> sf * 1.0-16 2024-03-24 [1] CRAN (R 4.4.0)
#> spatialsample * 0.5.1 2023-11-08 [1] CRAN (R 4.4.1)
#> survival 3.6-4 2024-04-24 [2] CRAN (R 4.4.1)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
#> timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
#> timeDate 4032.109 2023-12-14 [1] CRAN (R 4.4.0)
#> tune * 1.2.1 2024-04-18 [1] CRAN (R 4.4.1)
#> units 0.8-5 2023-11-28 [1] CRAN (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
#> wk 0.9.2 2024-07-09 [1] CRAN (R 4.4.1)
#> workflows * 1.1.4 2024-02-19 [1] CRAN (R 4.4.1)
#> xfun 0.45 2024-06-16 [1] CRAN (R 4.4.1)
#> yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.1)
#> yardstick 1.3.1 2024-03-21 [1] CRAN (R 4.4.1)
#>
#> [1] C:/Users/brian/AppData/Local/R/win-library/4.4
#> [2] C:/Program Files/R/R-4.4.1/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Try using add_formula instead of add_variables as a workaround
(Sorry for the brief reply -- I'm traveling at the moment so can't run stuff, but wanted to make sure I could try to help you get unstuck. This is definitely a bug somewhere)
Interesting. If I do it using add_formula() it does work.
# Load package
library(dplyr, warn.conflicts = FALSE)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(spatialsample)
library(workflows)
library(parsnip)
library(tune)
# Example data
nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# Making spatial clusters
nc_folds <- spatial_clustering_cv(nc, v = 5)
# Workflow for linear regression
lr_recipe <- workflow() %>%
add_formula(BIR74 ~ AREA) %>%
add_model(linear_reg(engine = "lm"))
# Tuning parameters
(spatial_lr <- fit_resamples(lr_recipe, nc_folds))
#> # Resampling results
#> # 5-fold spatial cross-validation
#> # A tibble: 5 × 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [79/21]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>
#> 2 <split [75/25]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>
#> 3 <split [77/23]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>
#> 4 <split [85/15]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>
#> 5 <split [84/16]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]>
# Best tuning parameters:
collect_metrics(spatial_lr)
#> # A tibble: 2 × 6
#> .metric .estimator mean n std_err .config
#> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 rmse standard 3542. 5 634. Preprocessor1_Model1
#> 2 rsq standard 0.178 5 0.0616 Preprocessor1_Model1
Created on 2024-07-22 with reprex v2.1.1