MAPIE icon indicating copy to clipboard operation
MAPIE copied to clipboard

Possible data leakage in ENBPI time series tutorial

Open samuelefiorini opened this issue 5 months ago • 0 comments

Is your documentation request related to a problem? Please describe.

While reading the MAPIE tutorial notebooks, I noticed a potential data leakage concern in this example: Tutorial for time series.

When model_params_fit_not_done=True, the same X_train, y_train are used for:

  1. Hyperparameter tuning via RandomizedSearchCV(TimeSeriesSplit)
  2. Fitting the final model
  3. Conformalization with TimeSeriesRegressor(method="enbpi")

Even though ENBPI uses out-of-bag residuals, wouldn’t this introduce leakage and lead to overly optimistic intervals, since the calibration residuals come from data that influenced model selection?

Describe the solution you'd like

The tutorial should recommend splitting off a separate chronological calibration window and provide a practical example of how to implement this.

samuelefiorini avatar Nov 14 '25 14:11 samuelefiorini