FutureWarning + KeyError
Hey there,
I was giving auto_ml a shot but it chokes on a KeyError.
Platform: Win 10, x64 Python 3.6.4 auto_ml 2.9.4
Installed with pip install auto_ml.
After the import I get a warning:
C:\Python36\lib\site-packages\deap\tools_hypervolume\pyhv.py:33: ImportWarning: Falling back to the python version of hypervolume module. Expect this to be very slow. "module. Expect this to be very slow.", ImportWarning)
When running train() I end up with this:
Calculating feature responses, for advanced analytics.
C:\Python36\lib\site-packages\sklearn\model_selection\_split.py:2026: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
FutureWarning)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-45-23d85ace19e4> in <module>()
----> 1 ml_predictor.train(X_train)
C:\Python36\lib\site-packages\auto_ml\predictor.py in train(***failed resolving arguments***)
639
640 # This is our main logic for how we train the final model
--> 641 self.trained_final_model = self.train_ml_estimator(self.model_names, self._scorer, X_df, y)
642
643 if self.ensemble_config is not None and len(self.ensemble_config) > 0:
C:\Python36\lib\site-packages\auto_ml\predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning, prediction_interval)
1202 # Use Case 1: Super straightforward: just train a single, non-optimized model
1203 elif (feature_learning == True and self.optimize_feature_learning != True) or (len(estimator_names) == 1 and self.optimize_final_model != True):
-> 1204 trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning, prediction_interval=False)
1205
1206 # Use Case 2: Compare a bunch of models, but don't optimize any of them
C:\Python36\lib\site-packages\auto_ml\predictor.py in fit_single_pipeline(self, X_df, y, model_name, feature_learning, prediction_interval)
854 # That saves a considerable amount of time
855 if feature_learning == False:
--> 856 self.print_results(model_name, ppl, X_df, y)
857
858 return ppl
C:\Python36\lib\site-packages\auto_ml\predictor.py in print_results(self, model_name, model, X, y)
1026 else:
1027 feature_responses = self.create_feature_responses(model, X, y, top_features)
-> 1028 self._join_and_print_analytics_results(feature_responses, sorted_model_results, sort_field='Importance')
1029 except AttributeError as e:
1030 if model_name == 'XGBRegressor':
C:\Python36\lib\site-packages\auto_ml\predictor.py in _join_and_print_analytics_results(self, df_feature_responses, df_features, sort_field)
1487
1488 # Sort by coefficients or feature importances
-> 1489 df_results = df_results[['Feature Name', sort_field, 'Delta', 'FR_Decrementing', 'FR_Incrementing', 'FRD_abs', 'FRI_abs', 'FRD_MAD', 'FRI_MAD']]
1490 else:
1491 df_results = df_features
C:\Python36\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2131 if isinstance(key, (Series, np.ndarray, Index, list)):
2132 # either boolean or fancy integer index
-> 2133 return self._getitem_array(key)
2134 elif isinstance(key, DataFrame):
2135 return self._getitem_frame(key)
C:\Python36\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
2175 return self._take(indexer, axis=0, convert=False)
2176 else:
-> 2177 indexer = self.loc._convert_to_indexer(key, axis=1)
2178 return self._take(indexer, axis=1, convert=True)
2179
C:\Python36\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
1267 if mask.any():
1268 raise KeyError('{mask} not in index'
-> 1269 .format(mask=objarr[mask]))
1270
1271 return _values_from_object(indexer)
KeyError: "['Delta' 'FR_Decrementing' 'FR_Incrementing' 'FRD_abs' 'FRI_abs' 'FRD_MAD'\n 'FRI_MAD'] not in index"
Where my code is:
from auto_ml import Predictor
import pandas as pd
from sklearn.model_selection import train_test_split
file = "./data/verified.normalized_full.csv"
X = pd.read_csv(file, header=None)
X.columns = ['id', 'title', 'content', 'label']
X = X.drop(['id'], axis=1)
y = X['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
column_descriptions = {
'title': 'nlp',
'content': 'nlp',
'label': 'output'
}
ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)
ml_predictor.train(X_train)
ml_predictor.score(X_test, X_test.label)
I'm sorry if this is just a usage error. I'm still trying going through the docs. But as they say "run first" that's what I did ;)
crap, sorry i didn't get to this 'til now.
you're totally right to run code first! this is just a blatant bug on my part that i haven't been able to reproduce. thank you very much for including the full traceback and the script you used to train the data- that's all super helpful.
it shows that it's probably an error with NLP data. i'm shipping a workaround tonight. will work on an actual fix soon.
sorry again for the slow response here. i'm extra bummed because you have one of the use cases that i explicitly designed this package for, and it's really cool to me to see how straightforward the code is that you wrote to train an nlp predictor.
let me know if you have any other feedback! or if you ended up using a different package, i'd love to hear that too- there's a lot these automated ml solutions can learn from each other.
Hey there, thanks for the reply.
I've created my own scripts for normalization, tf/tf-idf uni-/bigram chi2 selection etc. and finally used Keras with Tensorflow. However I'm still interested in this package - and if it's just to see what architecture and parameters were chosen.