Add update_transformers to synthesizers
Problem Description
As a user, it would be helpful to have ways to manually set custom transformers to use on my data before modeling.
Expected behavior
- Add
update_transformersmethod toBaseSynthesizer - Parameters:
- column_name_to_transformer (
dict): A dictionary mapping the name of the column to the transformer instance.
- column_name_to_transformer (
- Method should update the
HyperTransformerbased of the provided dict - Validation:
- Errors: (Raise if any 1 or more columns encounter the case. Do the checks first. We shouldn't partially update anything.)
- Updating a transformer that is incompatible with the sdtype provided in the metadata
Error: Column 'age' is a numerical column, which is incompatible with the 'LabelEncoder' preprocessing. - Adding a transformer other than AnonymizedFaker or RegexGenerator for a key column (primary, alternate, sequence key)
Error: Column 'user_id' is a key. It cannot be preprocessed using the 'FloatFormatter' transformer. - The user is assigning a transformer object that has already been fit
Error: Transformer for column 'age' has already been fit on data.
- Updating a transformer that is incompatible with the sdtype provided in the metadata
- Warnings: Raise all that arise
- (CTGAN, CopulaGAN, TVAE, PAR only): Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)
Warning: Replacing the default transformer for column 'degree_type' might impact the quality of your synthetic data - (GaussianCopula): Whenever the user is adding a OneHotEncoder to a categorical column
Warning: Using the OneHotEncoder for column 'degree_type' may slow down the preprocessing and modeling time
- (CTGAN, CopulaGAN, TVAE, PAR only): Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)
- Errors: (Raise if any 1 or more columns encounter the case. Do the checks first. We shouldn't partially update anything.)
@amontanez24 Could you clarify what you meant by Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)
@amontanez24 Could you clarify what you meant by
Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)
For CTGAN, CopulaGAN and TVAE, the categorical and boolean transformations are skipped. Instead of using the default categorical transformer for them, we should use None. If a user tries to change that, we raise the warning but let them do it since it won't technically break