roles {recipes} | R Documentation |
add_role()
adds a new role to an existing variable in the recipe. It
does not overwrite old roles, as a single variable can have multiple roles.
update_role()
alters an existing role in the recipe.
remove_role()
eliminates a single existing role in the recipe.
add_role(recipe, ..., new_role = "predictor", new_type = NULL) update_role(recipe, ..., new_role = "predictor", old_role = NULL) remove_role(recipe, ..., old_role)
recipe |
An existing |
... |
One or more selector functions to choose which variables are
being assigned a role. See |
new_role |
A character string for a single role. |
new_type |
A character string for specific type that the variable should
be identified as. If left as |
old_role |
A character string for the specific role to update for the
variables selected by |
With add_role()
, if a variable is selected that already has the
new_role
, a warning is emitted and that variable is skipped so no
duplicate roles are added. If no role currently exists (e.g. the current
role is NA
), an error is thrown; update_role()
should be used instead.
Adding or updating roles is a useful way to group certain variables that
don't fall in the standard "predictor"
bucket. You can perform a step
on all of the variables that have a custom role with the selector
has_role()
.
An updated recipe object.
library(recipes) data(biomass) # Using the formula method, roles are created for any outcomes and predictors: recipe(HHV ~ ., data = biomass) %>% summary() # However `sample` and `dataset` aren't predictors. Since they already have # roles, `update_role()` can be used to make changes: recipe(HHV ~ ., data = biomass) %>% update_role(sample, new_role = "id variable") %>% update_role(dataset, new_role = "splitting variable") %>% summary() # `update_role()` cannot set a role to NA, use `remove_role()` for that ## Not run: recipe(HHV ~ ., data = biomass) %>% update_role(sample, new_role = NA_character_) ## End(Not run) # ------------------------------------------------------------------------------ # Variables can have more than one role. `add_role()` can be used # if the column already has at least one role: recipe(HHV ~ ., data = biomass) %>% add_role(carbon, sulfur, new_role = "something") %>% summary() # `update_role()` has an argument called `old_role` that is required to # unambiguously update a role when the column currently has multiple roles. recipe(HHV ~ ., data = biomass) %>% add_role(carbon, new_role = "something") %>% update_role(carbon, new_role = "something else", old_role = "something") %>% summary() # `carbon` has two roles at the end, so the last `update_roles()` fails since # `old_role` was not given. ## Not run: recipe(HHV ~ ., data = biomass) %>% add_role(carbon, sulfur, new_role = "something") %>% update_role(carbon, new_role = "something else") ## End(Not run) # ------------------------------------------------------------------------------ # To remove a role, `remove_role()` can be used to remove a single role. recipe(HHV ~ ., data = biomass) %>% add_role(carbon, new_role = "something") %>% remove_role(carbon, old_role = "something") %>% summary() # To remove all roles, call `remove_role()` multiple times to reset to `NA` recipe(HHV ~ ., data = biomass) %>% add_role(carbon, new_role = "something") %>% remove_role(carbon, old_role = "something") %>% remove_role(carbon, old_role = "predictor") %>% summary() # ------------------------------------------------------------------------------ # If the formula method is not used, all columns have a missing role: recipe(biomass) %>% summary()