sklearn-deap icon indicating copy to clipboard operation
sklearn-deap copied to clipboard

Does not work with pipelines

Open dth5 opened this issue 6 years ago • 4 comments

For tuning a single estimator this tool is awesome. But the standard gridsearch can actually accept a pipeline as an estimator, which allows you to evaluate different classifiers as parameters.

For some reason, this breaks with EvolutionaryAlgorithmSearchCV.

For example, set a pipeline like this: pipe = Pipeline([ ('imputer', SimpleImputer(strategy='median')), ('scaler' , StandardScaler()), ('classify', LogisticRegression()) ])

Then define a parameter grid to include different classifiers: param_grid_rf_big = [ {'classify': [RandomForestClassifier(),ExtraTreesClassifier()], 'classify__n_estimators': [500], 'classify__max_features': ['log2', 'sqrt', None], 'classify__min_samples_split': [2,3], 'classify__min_samples_leaf': [1,2,3], 'classify__criterion': ['gini',] } ]

When you pass this to EvolutionaryAlgorithmSearchCV you should be able to set the estimator to 'pipe' and and the params to 'param_grid_rf_big' and let it evaluate. This works with gridsearchcv, but not with EvolutionaryAlgorithmSearchCV.

dth5 avatar Aug 14 '19 20:08 dth5

Hi @dth5 ! Can you paste what error you receive when trying to execute that code? Thanks

rsteca avatar Aug 15 '19 11:08 rsteca

Hi, Unfortunately I have moved beyond that code so I don't have the exact run anymore. However, If you set up a toy classifier (X, y) and pass it to EvolutionaryAlgorithmSearchCV where the estimator=pipe (from above) and the param-grid is param_grid_rf_big, you'll see the issue. This may not really be a bug, because RandomizedSearchCV also does not support this kind of parameter grid. GridSearchCV does, however, and the next release of scikit-learn will fix RandomizedSearchCV to allow this also. I suspect it comes down to the fact that usually one passes a list of dictionaries { dict, dict, dict } to the grid-search, but if you want to also allow the classifier to be a parameter (and have blocks of different classifiers with different parameters), then you need to pass a list of lists of dictionaries [ {dict, dict, dict}, {dict}, { dict, dict} ], which is currently only possible in GridSearchCV.

dth5 avatar Aug 15 '19 13:08 dth5

Ok, thanks. I will try to fix it when I have some free time

rsteca avatar Aug 15 '19 13:08 rsteca

Hello,

I have managed to tune hyperparameters for a classifier within a pipeline using the latest version 0.3. However, when I try to tune hyperparameters for transformers within the same pipeline, it throws up an error: "ValueError: Provided coef_init does not match dataset." The same pipeline with hyperparameter spaces for transformers and classifiers work fine with GridSearchCV, RandomozedSearchCV and external ones also BayesSearchCV, OptunaSearchCV, TuneSearchCV etc.
Let me know if you need more information.

Narayan

RNarayan73 avatar Oct 12 '21 19:10 RNarayan73