gam-changer icon indicating copy to clipboard operation
gam-changer copied to clipboard

Incompatibility with ExplainableBoostingRegressor: unsupported feature type 'ordinal'

Open zuberman35 opened this issue 2 years ago • 5 comments

After training an EBM Regressor and manually specifying the datatypes to 'nominal or 'ordinal' ( as categorical is not supported) I cannot create a gc, even when manually trying to change the feature_types. See below example for the MPG dataset.

ebm = ExplainableBoostingRegressor(
    feature_names=['displacement', 'horsepower', 'weight', 'acceleration','origin', 'cylinders', 'model_year'
       ],
    feature_types=['continuous', 'continuous','continuous','continuous','nominal','ordinal','ordinal'],
    random_state=42,
    n_jobs=-1
)
ebm.fit(X_train, y_train)
ebm.feature_types = ['continuous', 'continuous', 'continuous', 'continuous', 'categorical', 'categorical', 'categorical']

gc only seems to work with feature_type='none'.

Is there a workaround or fix?

zuberman35 avatar Sep 08 '23 10:09 zuberman35

To hack this, try this instead (NOTE: I have not tested this):

ebm.feature_types_in_ = ['continuous', 'continuous', 'continuous', 'continuous', 'categorical', 'categorical', 'categorical']

paulbkoch avatar Sep 08 '23 19:09 paulbkoch

Actually, looking at the code I think it will instead need to be:

ebm.feature_types_in_ = ['continuous', 'continuous', 'continuous', 'continuous', 'nominal', 'nominal', 'nominal']

Because here 'nominal' is supported, but not 'categorical' or 'ordinal'. For prediction 'nominal' and 'ordinal' will be identical:

https://github.com/interpretml/gam-changer/blob/aba94f624c726e63ef7360c5c691aedac3a44bd4/notebook-widget/gamchanger/gamchanger.py#L64

https://github.com/interpretml/gam-changer/blob/aba94f624c726e63ef7360c5c691aedac3a44bd4/notebook-widget/gamchanger/gamchanger.py#L84

paulbkoch avatar Sep 08 '23 19:09 paulbkoch

This worked, thanks for the lightning fast response :)

zuberman35 avatar Sep 09 '23 13:09 zuberman35

While this fix works in a sense, that it displays the gc visual, the metrics (right) are now gone, see below screenshot image

zuberman35 avatar Sep 09 '23 17:09 zuberman35

Two questions:

  1. If you train the model using only nominals (no ordinals), does it still have this visualization issue. My guess is that it will.
  2. What datatypes are the columns of X? If you force the nominal and ordinal columns to be strings, does that fix the issue?

paulbkoch avatar Sep 12 '23 04:09 paulbkoch