Extend predictors to zero-cost case
In the zerocost branch, Ensemble has been extended to the Zero Cost case and contains a single option for its base predictor, XGBoost. The XGBoost predictor has been adapted to the zero-cost case by implementing a set_pre_computations function, as well as modifying the BaseTree class. Currently, the Ensemble class supports only this single predictor:
trainable_predictors = {
"xgb": XGBoost(
ss_type=self.ss_type, zc=self.zc, encoding_type="adjacency_one_hot", zc_only=self.zc_only
)
}
Should all other predictors be available in the merged Ensemble class and extended to the zero-cost case?
Ideally, yes. But there are about 19 predictors in the original ensemble. Let's focus now on extending the ZC case to the tree-based predictors, which are LGBoost, NGBoost, and RandomForestPredictor.
The other predictors must be available, of course, but without the option of using a ZC predictor. This also means that the get_ensemble method must be modified to return the sets of predictors based on self.zc
Sounds good.
We found that in the zerocost branch, the XGBoost class contains three functions specific to the zero-cost case, set_pre_computations, _verify_zc_info, and _set_zc_names which are also applicable to the other tree-based predictors.
In order to not duplicate these functions, we placed them in the BaseTree class, which is the parent class of all tree-based predictors.
One small remaining issue is a discrepancy between the zerocost and Develop implementation of fit in XGBoost.
In the Develop branch, it is possible for the user to load in custom hyper-parameters from a config file
def fit(self, xtrain, ytrain, train_info=None, params=None, **kwargs):
if self.hparams_from_file and self.hparams_from_file not in ['False', 'None'] \
and os.path.exists(self.hparams_from_file):
self.hyperparams = json.load(open(self.hparams_from_file, 'rb'))['xgb']
print('loaded hyperparams from', self.hparams_from_file)
elif self.hyperparams is None:
self.hyperparams = self.default_hyperparams.copy()
return super(XGBoost, self).fit(xtrain, ytrain, train_info, params, **kwargs)
while in the zerocost branch, this is not an option.
def fit(self, xtrain, ytrain, train_info=None, params=None, **kwargs):
if self.hyperparams is None:
self.hyperparams = self.default_hyperparams.copy()
return super(XGBoost, self).fit(xtrain, ytrain, train_info, params, **kwargs)
Which functionality should be adopted in the Develop_copy branch? Is this a case where the code in the zerocost branch should be taken as the more updated version?
Best to be able to read from config file, too.