interpret icon indicating copy to clipboard operation
interpret copied to clipboard

Smoothing only for some features defined by the user

Open m-maggi opened this issue 5 months ago • 12 comments

Hi, thanks for the great package. I would like to use the smoothing_rounds argument only for a subset of the features of my model, the rest of the variables should have smooting_rounds=0. This since I know which features are smooth and which features I want to model "un-smoothly". I looked at the examples here https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html and here https://interpret.ml/docs/python/examples/custom-interactions.html. Currently I am merging two models, one where I exclude all the non-smooth features, and one where I exclude all the smooth features (I do this using the argument exclude). I follow the steps of the "custom interactions" example, adjusting the bin weights / scaling / adding intercepts. I don't know the algorithms in detail: would the described model merge be enough or would one rather need an iterative procedure where mod_smooth and mod_non_smooth are trained every time with 1 boosting round and they are fitted using init_score using the values from the previous iterations? With such a procedure both the smooth and the non-smooth model would be always updated rather than only one model being fully trained and then the second model being trained using init_score.

m-maggi avatar Aug 07 '25 16:08 m-maggi

Thanks @m-maggi -- Glad to hear you are finding the package useful.

Smoothing rounds are run at the start before any other type of boosting occurs. As such, I'd modify your procedure as follows:

Step 1: Fit an EBM that excludes all non-smooth features. Make max_rounds equal to smoothing_rounds. You'll end up with a very simple EBM that has smooth graphs on your smooth terms and the non-smooth terms will be unused.

Step 2: Fit a new EBM passing in the EBM from step 1 via the init_score parameter and setting smoothing_rounds=0.

Step 3: Merge the two EBMs, doubling the term score as described in the custom-interactions notebook you linked to.

If you want to paste your code, I'd be happy to have a look.

paulbkoch avatar Aug 08 '25 09:08 paulbkoch

Thanks for your reply @paulbkoch. I did a fork to show you what I mean by smoothing only some variables. It's currently hardcoded: what's your opinion on having an argument to select which features should be smoothed? I would be glad to contribute (or at least try to).

https://github.com/m-maggi/interpret/blob/9949c1e5b51859c706b6af75d9384a26e796485e/python/interpret-core/interpret/glassbox/_ebm/_boost.py#L197-L210

If you rerun the example https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html with this change you get smoothed global explanations only for the first three features.

I went this way because with the merging I was not getting the results I expected.

m-maggi avatar Aug 08 '25 16:08 m-maggi

Hi @m-maggi -- The methodology I proposed above should be either identical or nearly identical to how it would work if incorporated as a new parameter. Fitting two separate EBMs with different subsets of the features won't be quite as performant, especially when there is correlation between the smooth and non-smooth features. Also, in that case both of the EBMs are fully fit, so you wouldn't adjust the scores or weights before merging.

Since there is a way to achieve this externally, I'd be hesitant to add a new parameter to the already long list of existing parameters. I could see modifying the smoothing_rounds parameter to accept a list of integers where the list of integers is the number of smoothing rounds for each feature. This wouldn't introduce a new parameter.

Eg: smoothing_rounds = [1000, 1000, 0, 0, 0, 1000] # smooth features 0, 1, and 5.

It's a bit trickier if the user specifies different numbers of smoothing rounds for each feature, but I guess in that case the right thing to do would be to continue to do smoothing while any feature hasn't reached its specified number of smoothing rounds.

paulbkoch avatar Aug 08 '25 23:08 paulbkoch

ok, let me try first to understand the methodology with the two separate EMBs. I tried to modify the regression with synthetic data example as follows:

from interpret.glassbox import ExplainableBoostingRegressor
from interpret.glassbox import merge_ebms

features_smooth = [(i,) for i in range(X_train.shape[1]) if i > 3]
features_non_smooth = [(i,) for i in range(X_train.shape[1]) if i <= 3]

ebm_smooth = ExplainableBoostingRegressor(
    names,
    types,
    interactions=3,
    smoothing_rounds=5000,
    reg_alpha=10.0,
    max_rounds=5000,
    exclude=features_smooth,
)
ebm_smooth.fit(X_train, y_train)

ebm_non_smooth = ExplainableBoostingRegressor(
    names,
    types,
    interactions=3,
    smoothing_rounds=0,
    reg_alpha=10.0,
    max_rounds=5000,
    exclude=features_non_smooth,
)
ebm_non_smooth.fit(
    X_train,
    y_train,
    init_score=ebm_smooth,
)

ebm_non_smooth.bins_ = [
    l1 if len(l2) == 0 else l2 for l1, l2 in zip(ebm_smooth.bins_, ebm_non_smooth.bins_)
]
ebm_smooth.bins_ = [
    l1 if len(l2) == 0 else l2 for l1, l2 in zip(ebm_non_smooth.bins_, ebm_smooth.bins_)
]
ebm = merge_ebms([ebm_smooth, ebm_non_smooth])
for i in range(len(ebm.term_features_)):
    ebm.scale(i, 2.0)
    ebm.bin_weights_[i] *= 0.5

ebm.intercept_ = ebm_smooth.intercept_ + ebm_non_smooth.intercept_

ebm.bagged_intercept_ = None
ebm.bagged_scores_ = None
ebm.standard_deviations_ = None

m-maggi avatar Aug 09 '25 05:08 m-maggi

from interpret.glassbox import ExplainableBoostingRegressor
from interpret.glassbox import merge_ebms

exclude = [(i,) for i in range(X_train.shape[1]) if i > 3]

ebm_smooth = ExplainableBoostingRegressor(
    names,
    types,
    interactions=0,
    smoothing_rounds=5000,
    reg_alpha=10.0,
    max_rounds=5000,
    exclude=exclude,
)
ebm_smooth.fit(X_train, y_train)

ebm_non_smooth = ExplainableBoostingRegressor(
    names,
    types,
    interactions=3,
    smoothing_rounds=0,
    reg_alpha=10.0,
    max_rounds=50000,
)
ebm_non_smooth.fit(
    X_train,
    y_train,
    init_score=ebm_smooth,
)

ebm_non_smooth.bins_ = [
    l1 if len(l2) == 0 else l2 for l1, l2 in zip(ebm_smooth.bins_, ebm_non_smooth.bins_)
]
ebm_smooth.bins_ = [
    l1 if len(l2) == 0 else l2 for l1, l2 in zip(ebm_non_smooth.bins_, ebm_smooth.bins_)
]
ebm = merge_ebms([ebm_smooth, ebm_non_smooth])
for i in range(len(ebm.term_features_)):
    ebm.scale(i, 2.0)
    ebm.bin_weights_[i] *= 0.5

ebm.intercept_ = ebm_smooth.intercept_ + ebm_non_smooth.intercept_

ebm.bagged_intercept_ = None
ebm.bagged_scores_ = None
ebm.standard_deviations_ = [None] * len(ebm.term_scores_)

ebm.histogram_weights_ = [[0.0, 0.0]] * ebm.n_features_in_
ebm.histogram_edges_ = [[]] * ebm.n_features_in_

paulbkoch avatar Aug 09 '25 11:08 paulbkoch

from interpret.glassbox import ExplainableBoostingRegressor
from interpret.glassbox import merge_ebms

exclude = [(i,) for i in range(X_train.shape[1]) if i > 3]

ebm_smooth = ExplainableBoostingRegressor(
    names,
    types,
    interactions=0,
    smoothing_rounds=5000,
    reg_alpha=10.0,
    max_rounds=5000,
    exclude=exclude,
)
ebm_smooth.fit(X_train, y_train)

ebm_non_smooth = ExplainableBoostingRegressor(
    names,
    types,
    interactions=3,
    smoothing_rounds=0,
    reg_alpha=10.0,
    max_rounds=50000,
)
ebm_non_smooth.fit(
    X_train,
    y_train,
    init_score=ebm_smooth,
)

ebm_non_smooth.bins_ = [
    l1 if len(l2) == 0 else l2 for l1, l2 in zip(ebm_smooth.bins_, ebm_non_smooth.bins_)
]
ebm_smooth.bins_ = [
    l1 if len(l2) == 0 else l2 for l1, l2 in zip(ebm_non_smooth.bins_, ebm_smooth.bins_)
]
ebm = merge_ebms([ebm_smooth, ebm_non_smooth])
for i in range(len(ebm.term_features_)):
    ebm.scale(i, 2.0)
    ebm.bin_weights_[i] *= 0.5

ebm.intercept_ = ebm_smooth.intercept_ + ebm_non_smooth.intercept_

ebm.bagged_intercept_ = None
ebm.bagged_scores_ = None
ebm.standard_deviations_ = [None] * len(ebm.term_scores_)

when running this I observe the following partial dependence plot (pdp) for feature with idx=4

Image

code I use for the pdp mainly taken from the example in the sklearn documentation here https://scikit-learn.org/1.5/modules/generated/sklearn.inspection.PartialDependenceDisplay.html:

from sklearn.inspection import PartialDependenceDisplay
from sklearn.inspection import partial_dependence

feature_idx = 4
X_explain = X_train[:1000, :]
pd_results = partial_dependence(
    ebm, X_explain, features=feature_idx, kind="average", grid_resolution=50
)
deciles = {0: np.linspace(0, 1, num=5)}
features, feature_names = [(0,)], [f"Features #{i}" for i in range(X_train.shape[1])]
display = PartialDependenceDisplay(
    [pd_results],
    features=features,
    feature_names=feature_names,
    target_idx=0,
    deciles=deciles,
)
display.plot()

also the output of

ebm.explain_global()

is not anymore readable:

Image

If I run the synthetic data example notebook with the fork I linked above (again the link: https://github.com/m-maggi/interpret/blob/9949c1e5b51859c706b6af75d9384a26e796485e/python/interpret-core/interpret/glassbox/_ebm/_boost.py#L197-L210) I get the following pdp

Image

and the ebm.explain_global() is readable

Image

Perhaps it would be convenient to have the possibility of passing either an integer or an array of integer to smoothing_rounds? As you were suggesting, in case of array, the model would use the given number of smoothing rounds for each term. I am not sure about the behaviour with the interactions as I don't know the algorithm in detail, what do you think?

m-maggi avatar Aug 09 '25 14:08 m-maggi

The issue with having too many bins in the histograms can be fixed. merge_ebms doesn't yet merge the histograms, which are only used in the visualizations and are shown at the bottom of the graphs. I added two lines to my code above to eliminate the histograms on the merged model, which makes it more visualizable.

In terms of the model that gets generated, I'm not seeing the same issues in the shape functions. Here's my feature 4:

Image

There's also a modest improvement in the RMSE which goes down to 0.2650 from 0.2683 originally.

I agree it would be nice to allow the specification of smoothing_rounds per-feature. smoothing_rounds only applies to the mains, so allowing a per-feature specification should work out nicely. We have a separate parameter called interaction_smoothing_rounds for the interactions. If the user specifies explicit interactions, then we could do the same there.

paulbkoch avatar Aug 09 '25 19:08 paulbkoch

ok, I don't know why I get the "decreasing" effect for feature 4 if I use the merged ebm, I even got it on two different machines and OS (Windows and Mac OS). I would be glad to submit a PR for customising the number of smoothing rounds for main features and interactions, I don't know when I will be able though. We could also allow the user to specify L1 and L2 regularisation per feature, what do you think?

m-maggi avatar Aug 10 '25 07:08 m-maggi

A PR for per-feature smoothing rounds would be great.

L1 and L2 per feature would also be interesting. The problem with those ones is that L1 and L2 are also used in interactions, so it's not quite as nice to specify from an API point of view. I guess we could just make the regularization zero though for interactions if someone specifies these per-feature. The user could always do multi-stage EBMs as above if they want to apply L1/L2 to the interactions.

Sorry this part of the package is as messy as it is currently. Eventually, we do want to implement scikit-learn style warm_start, which should make fitting multi-stage EBMs a lot easier.

paulbkoch avatar Aug 10 '25 07:08 paulbkoch

For the 2nd PDP that isn't decreasing, is that a PDP from an EBM without interactions?

paulbkoch avatar Aug 10 '25 07:08 paulbkoch

For the 2nd PDP that isn't decreasing, is that a PDP from an EBM without interactions?

it is from my fork where I hardcoded the smoothing only for the first three variables (see https://github.com/m-maggi/interpret/blob/9949c1e5b51859c706b6af75d9384a26e796485e/python/interpret-core/interpret/glassbox/_ebm/_boost.py#L197-L210) the example notebook is unchanged, i.e. the ebm has interactions=3. So to recap the results running my fork, the cell

from interpret.glassbox import ExplainableBoostingRegressor

ebm = ExplainableBoostingRegressor(
    names, types, interactions=3, smoothing_rounds=5000, reg_alpha=10.0
)
ebm.fit(X_train, y_train)

ran in 21.1 seconds and produced the pdp

Image

If I run your code from https://github.com/interpretml/interpret/issues/626#issuecomment-3170615331 using the source from commit b6b7e13a02b878cbfc04605d543e0efbd732ae5c the comparable fit cell runs in 53.9 seconds and I get pdp

Image

so now also the ebm merging approach seems fine, I perhaps made some mistakes with the local editable install. Many thanks for your help @paulbkoch!

m-maggi avatar Aug 10 '25 08:08 m-maggi

Thanks for verifying that @m-maggi!

paulbkoch avatar Aug 10 '25 09:08 paulbkoch