tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Preprocessing on specific features

Open hanshupe opened this issue 5 years ago • 3 comments

I see that after the TPOT optimization a preprocessor like robustScaler was selected. I wonder if it's possible that robustScaler is applied not on the entire set of features but only on a few were it makes sense. One feature may have outliers and require robustScaler(0.1, 0.9), but the other features not. Can this be considered with TPOT?

hanshupe avatar Sep 05 '20 07:09 hanshupe

I think current TPOT may not support this kind of application. But I think it need adding the ColumnTransformer from scikit-learn into TPOT.

weixuanfu avatar Sep 14 '20 14:09 weixuanfu

Interested in working on this. ColumnTransformer takes a list of columns. How to make this list available to the genetic algorithm?

edwardyu avatar Dec 11 '20 04:12 edwardyu

OK I have an idea. In TPOTBase.fit(), add a hook called _dynamically_modify_config_dict(), which will modify config_dict with a dict like this:

    'tpot.builtins.ColumnTransformer': {
        'transformers': [''sklearn.preprocessing.StandardScaler', 'sklearn.preprocessing.RobustScaler', ...],
        'include_col_1': [True, False],
        'include_col_2': [True, False],
        ...
        'include_col_n': [True, False],
    },

What do you think @weixuanfu ?

edwardyu avatar Dec 11 '20 04:12 edwardyu