tpot icon indicating copy to clipboard operation
tpot copied to clipboard

how to print the second best pipeline?

Open m-alshehri opened this issue 4 years ago • 1 comments

Hello, I was just wondering if there's any way to print out the confusion matrix, classification report and the pipeline for the second-best pipeline?

the model now is printing the best pipeline as below but would also like to print the second-best pipeline.

model = TPOTClassifier(generations=10, scoring='balanced_accuracy', verbosity=2)
model.fit(X_train, y_train)
Optimization Progress: 48%
530/1100 [2:07:27<3:41:41, 23.34s/pipeline]

Generation 1 - Current best internal CV score: 0.8820838802533277
Generation 2 - Current best internal CV score: 0.8828284663262757
Generation 3 - Current best internal CV score: 0.8828284663262757
Generation 4 - Current best internal CV score: 0.8842320902149032
Generation 5 - Current best internal CV score: 0.8842320902149032
Generation 6 - Current best internal CV score: 0.8842320902149032
Generation 7 - Current best internal CV score: 0.8842320902149032
Generation 8 - Current best internal CV score: 0.8842320902149032
Generation 9 - Current best internal CV score: 0.8842320902149032
Generation 10 - Current best internal CV score: 0.8842320902149032
Best pipeline: BernoulliNB(KNeighborsClassifier(input_matrix, n_neighbors=41, p=1, weights=uniform), alpha=0.01, fit_prior=True)
TPOTClassifier(config_dict=None, crossover_rate=0.1, cv=5,
               disable_update_check=False, early_stop=None, generations=10,
               log_file=None, max_eval_time_mins=5, max_time_mins=None,
               memory=None, mutation_rate=0.9, n_jobs=1, offspring_size=None,
               periodic_checkpoint_folder=None, population_size=100,
               random_state=None, scoring='balanced_accuracy', subsample=1.0,
               template=None, use_dask=False, verbosity=2, warm_start=False)
Acc.: 0.8521771865980675
              precision    recall  f1-score   support

           0       1.00      0.85      0.92      7902
           1       0.05      0.97      0.10        67

    accuracy                           0.85      7969
   macro avg       0.53      0.91      0.51      7969
weighted avg       0.99      0.85      0.91      7969
Confusion Matrix:
[[6726 1176]
 [   2   65]]

Apologies if this was previously asked but searching Second Best returned nothing

Thanks, m-alshehri

m-alshehri avatar Sep 27 '21 11:09 m-alshehri

Give this a try:

my_dict = list(tpot.evaluated_individuals_.items())

model_scores = pd.DataFrame()
for model in my_dict:
    model_name = model[0]
    model_info = model[1]
    cv_score = model[1].get('internal_cv_score')  # Pull out cv_score as a column (i.e., sortable)
    model_scores = model_scores.append({'model': model_name,
                                        'cv_score': cv_score,
                                        'model_info': model_info,},
                                       ignore_index=True)

model_scores = model_scores.sort_values('cv_score', ascending=False)
top_models = model_scores.iloc[0:5,:]
top_models.to_csv('top_models.csv', index = False)

See https://github.com/EpistasisLab/tpot/issues/703

wayneking517 avatar Nov 03 '21 17:11 wayneking517