tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Cannot reproduce pipeline results with sklearn pipeline

Open DrRaja opened this issue 3 years ago • 1 comments

For my data, I got the best pipeline by running TPOT training using the following parameters:

from tpot import TPOTClassifier
tpot = TPOTClassifier(generations=5,
                      population_size=100, 
                      verbosity=2, 
                      n_jobs=-1,random_state=1)

The best pipeline was given as:

Best pipeline: MLPClassifier(GaussianNB(Binarizer(input_matrix, threshold=0.0)), alpha=0.001, learning_rate_init=0.001)
TPOTClassifier(generations=5, n_jobs=-1, random_state=1, verbosity=2)

The best CV score I achieved was 0.822

Using the ensemble provided above I trained an ensemble pipeline using sklearn as:

base_model = GaussianNB()

meta_model = MLPClassifier(random_state=1, 
                        learning_rate_init=0.001,
                        alpha=0.001)


ensemble = StackingClassifier(estimators=[('base_model', base_model), 
                                                     ('meta_model', meta_model)],
                                         final_estimator=meta_model,
                               n_jobs=-1)

The score I get from this is 0.79

Can you tell me why I getting different scores when all my parameters are same?

DrRaja avatar Feb 16 '23 13:02 DrRaja

The manual pipeline is not exactly identical to the TPOT output. It is missing the Binarizer step.

Also, TPOT wraps internal classifiers in a StackingEstimator. This will pass through its inputs in addition to its predictions. (https://github.com/EpistasisLab/tpot/blob/master/tpot/builtins/stacking_estimator.py).

Going off memory, I believe this is what the TPOT output would be equivalent to:

step1 = Binarizer(threshold=0.0)

base_model = StackingEstimator(GaussianNB())

meta_model = MLPClassifier(random_state=1, 
                        learning_rate_init=0.001,
                        alpha=0.001)


ensemble = sklearn.pipeline.Pipeline(estimators=[('step1',step1),
('base_model', base_model), 
                                                     ('meta_model', meta_model)],
                                         final_estimator=meta_model,
                               n_jobs=-1)

The binarized transforms the data -> transformed data -> GaussianNB -> transformed data + predictions -> MLPclassifier

perib avatar May 09 '23 00:05 perib