tpot
tpot copied to clipboard
how to print the second best pipeline?
Hello, I was just wondering if there's any way to print out the confusion matrix, classification report and the pipeline for the second-best pipeline?
the model now is printing the best pipeline as below but would also like to print the second-best pipeline.
model = TPOTClassifier(generations=10, scoring='balanced_accuracy', verbosity=2)
model.fit(X_train, y_train)
Optimization Progress: 48%
530/1100 [2:07:27<3:41:41, 23.34s/pipeline]
Generation 1 - Current best internal CV score: 0.8820838802533277
Generation 2 - Current best internal CV score: 0.8828284663262757
Generation 3 - Current best internal CV score: 0.8828284663262757
Generation 4 - Current best internal CV score: 0.8842320902149032
Generation 5 - Current best internal CV score: 0.8842320902149032
Generation 6 - Current best internal CV score: 0.8842320902149032
Generation 7 - Current best internal CV score: 0.8842320902149032
Generation 8 - Current best internal CV score: 0.8842320902149032
Generation 9 - Current best internal CV score: 0.8842320902149032
Generation 10 - Current best internal CV score: 0.8842320902149032
Best pipeline: BernoulliNB(KNeighborsClassifier(input_matrix, n_neighbors=41, p=1, weights=uniform), alpha=0.01, fit_prior=True)
TPOTClassifier(config_dict=None, crossover_rate=0.1, cv=5,
disable_update_check=False, early_stop=None, generations=10,
log_file=None, max_eval_time_mins=5, max_time_mins=None,
memory=None, mutation_rate=0.9, n_jobs=1, offspring_size=None,
periodic_checkpoint_folder=None, population_size=100,
random_state=None, scoring='balanced_accuracy', subsample=1.0,
template=None, use_dask=False, verbosity=2, warm_start=False)
Acc.: 0.8521771865980675
precision recall f1-score support
0 1.00 0.85 0.92 7902
1 0.05 0.97 0.10 67
accuracy 0.85 7969
macro avg 0.53 0.91 0.51 7969
weighted avg 0.99 0.85 0.91 7969
Confusion Matrix:
[[6726 1176]
[ 2 65]]
Apologies if this was previously asked but searching Second Best returned nothing
Thanks, m-alshehri
Give this a try:
my_dict = list(tpot.evaluated_individuals_.items())
model_scores = pd.DataFrame()
for model in my_dict:
model_name = model[0]
model_info = model[1]
cv_score = model[1].get('internal_cv_score') # Pull out cv_score as a column (i.e., sortable)
model_scores = model_scores.append({'model': model_name,
'cv_score': cv_score,
'model_info': model_info,},
ignore_index=True)
model_scores = model_scores.sort_values('cv_score', ascending=False)
top_models = model_scores.iloc[0:5,:]
top_models.to_csv('top_models.csv', index = False)
See https://github.com/EpistasisLab/tpot/issues/703