Slow calculation of Partial Dependence Data and Plot

Open PhilippPro opened this issue 6 months ago • 1 comments

Dear @giuseppec

we are trying out the iml package in combination with mlr3. We experience quite slow calculations.

Here is some raw code. Maybe I can post some reproducible code later.

Our dataset has around 1 million observations and around 50 variables.

# LightGBM Learner
learner = mlr_learners$get("classif.lightgbm")
split = partition(task, ratio = 0.67)
learner$train(task, row_ids = split$train)

task_x = task$data(rows = split$test, cols = task$feature_names)
task_y = task$data(rows = split$test, cols = task$target_names)

predictor = Predictor$new(learner, data = task_x, y = task_y)

effect <- FeatureEffect$new(predictor, feature = "my_variable", method = "pdp+ice", grid.size = 3)
effect$plot(rug = FALSE)

The slow functions are FeatureEffect$new and effect$plot.

I programmed it myself with data.table and was able to make it around 10 times faster. In my code I just replaced the variable in data.table and predicted on this new data.

Jul 16 '25 12:07 PhilippPro

We found out, that it is needed to change the batch.size.

This should be maybe set to a different default.

predictor = Predictor$new(learner, data = task_x, y = task_y, batch.size = 50000)

Jul 22 '25 13:07 PhilippPro