Expand grouping variables for bootstrap intervals
For some tune internals, it would be helpful to be able to make intervals for an extended set of column columns (s opposed to just terms). See tidymodels/tune#818.
These changes are a proposal to expand things to include columns starting with a period. We can discuss it, and I can create more unit tests if we're good with this.
Here's an example:
library(tidymodels)
tidymodels_prefer()
theme_set(theme_bw())
options(pillar.advice = FALSE, pillar.min_title_chars = Inf)
# Get regression estimates for each house type
lm_est <- function(split, ...) {
analysis(split) %>%
tidyr::nest(.by = c(type)) %>%
mutate(
betas = purrr::map(data, ~ lm(log10(price) ~ sqft, data = .x) %>% tidy())
) %>%
rename(.type = type) %>%
select(.type, betas) %>%
unnest(cols = betas)
}
set.seed(52156)
house_rs <-
bootstraps(Sacramento, 1000, apparent = TRUE) %>%
mutate(results = map(splits, lm_est))
int_pctl(house_rs, results)
#> # A tibble: 6 × 7
#> term .type .lower .estimate .upper .alpha .method
#> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) Condo 4.45 4.59 4.72 0.05 percentile
#> 2 (Intercept) Multi_Family 4.74 5.25 5.71 0.05 percentile
#> 3 (Intercept) Residential 4.93 4.96 4.99 0.05 percentile
#> 4 sqft Condo 0.000412 0.000520 0.000659 0.05 percentile
#> 5 sqft Multi_Family -0.000197 0.0000344 0.000277 0.05 percentile
#> 6 sqft Residential 0.000211 0.000225 0.000240 0.05 percentile
int_t(house_rs, results)
#> # A tibble: 6 × 7
#> term .type .lower .estimate .upper .alpha .method
#> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) Condo 4.47 4.59 4.73 0.05 student-t
#> 2 (Intercept) Multi_Family 4.81 5.25 5.78 0.05 student-t
#> 3 (Intercept) Residential 4.93 4.96 4.99 0.05 student-t
#> 4 sqft Condo 0.000386 0.000520 0.000621 0.05 student-t
#> 5 sqft Multi_Family -0.000193 0.0000344 0.000223 0.05 student-t
#> 6 sqft Residential 0.000210 0.000225 0.000239 0.05 student-t
int_bca(house_rs, results, .fn = lm_est)
#> # A tibble: 6 × 7
#> term .type .lower .estimate .upper .alpha .method
#> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) Residential 4.94 4.96 4.99 0.05 BCa
#> 2 sqft Residential 0.000210 0.000225 0.000239 0.05 BCa
#> 3 (Intercept) Condo 4.47 4.59 4.74 0.05 BCa
#> 4 sqft Condo 0.000395 0.000520 0.000638 0.05 BCa
#> 5 (Intercept) Multi_Family 4.64 5.25 5.62 0.05 BCa
#> 6 sqft Multi_Family -0.000156 0.0000344 0.000330 0.05 BCa
Created on 2024-01-19 with reprex v2.0.2
This is ready for final review.
I've set up the int_pctl() S3 method for tune_results objects to work with the current interval methods in rsample and with this change.
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.