recipes icon indicating copy to clipboard operation
recipes copied to clipboard

Function that removes step(s) from an existing recipe

Open walrossker opened this issue 4 years ago • 2 comments

Feature

When testing multiple model types (e.g., in an ensemble workflow), one could create a baseline recipe that works for most of the models. Then if one model has slightly different pre-processing requirements (e.g., maybe you'd prefer to remove step_dummy in a rand_forest model or remove step_rm to include more predictors in an elastic net compared to a knn), a single call to this function could remove those steps and avoid the need to copy/paste the original recipe definition and delete certain steps. This helps to keep recipe definitions DRY.

Reproducible Example

FWIW, I chose the verb ignore_ to avoid confusion with 'remove' from step_rm.

Happy to submit a pull request if this seems valuable.

suppressPackageStartupMessages(devtools::load_all("."))
#> Loading recipes

rec <- recipe(Species ~ ., data = iris) %>%
  step_rm(Petal.Width, id = "rm_UDLut") %>%
  step_rm(starts_with("Sepal"), id = "custom_id")

rec %>% ignore_step("custom_id") # remove by custom id value
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          4
#> 
#> Operations:
#> 
#> Variables removed Petal.Width
rec %>% ignore_step(1) # remove the first step
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          4
#> 
#> Operations:
#> 
#> Variables removed starts_with("Sepal")
rec %>% ignore_step("rm") # remove all `step_rm` steps  (i.e., all steps here)
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          4
#> 
#> Operations:

walrossker avatar Jan 27 '22 21:01 walrossker

We are looking into this a bit.

If the recipe has been estimated (or partially estimated), this wouldn't be feasible since the recipe would have some critical data that can't be rolled back. We are verifying this.

If the recipe has not been defined, this is pretty easy to do (just by subsetting the recipes$steps list). In this case, we could offer an easy api that takes number and/or id vectors as inputs and checks the recipe for having been trained.

topepo avatar Feb 03 '22 14:02 topepo

Yeah the function I have now only works on unprepped recipes as you describe. I can imagine that removing steps from estimated recipes would be much more complicated.

walrossker avatar Feb 04 '22 17:02 walrossker