embed icon indicating copy to clipboard operation
embed copied to clipboard

Steps idea: Dealing with correlation

Open EmilHvitfeldt opened this issue 2 years ago • 0 comments

  1. find the correlation structure
  2. find groups of highly correlated features
  3. replace each group with the PC of just those features
  4. profit

look at correlation filter

library(tidymodels)
spline_cols <- ames |>
select(where(function(x) n_distinct(x) > 1000 && is.numeric(x))) |>
names()
recipe(~., data = ames) |>
step_rm(all_nominal_predictors()) |>
step_spline_natural(any_of(spline_cols), deg_free = 10) |>
prep() |>
bake(NULL) |>
corrr::correlate() |>
autoplot(method = "identity")

library(tidymodels)
spline_cols <- ames |>
select(where(function(x) n_distinct(x) > 1000 && is.numeric(x)))
recipe(~., data = spline_cols) |>
step_pca(all_predictors(), threshold = 1) |>
step_spline_natural(all_predictors(), deg_free = 10) |>
prep() |>
bake(NULL) |>
corrr::correlate() |>
autoplot(method = "identity")

EmilHvitfeldt avatar Oct 01 '23 22:10 EmilHvitfeldt