iml
iml copied to clipboard
Slow calculation of Partial Dependence Data and Plot
Dear @giuseppec
we are trying out the iml package in combination with mlr3. We experience quite slow calculations.
Here is some raw code. Maybe I can post some reproducible code later.
Our dataset has around 1 million observations and around 50 variables.
# LightGBM Learner
learner = mlr_learners$get("classif.lightgbm")
split = partition(task, ratio = 0.67)
learner$train(task, row_ids = split$train)
task_x = task$data(rows = split$test, cols = task$feature_names)
task_y = task$data(rows = split$test, cols = task$target_names)
predictor = Predictor$new(learner, data = task_x, y = task_y)
effect <- FeatureEffect$new(predictor, feature = "my_variable", method = "pdp+ice", grid.size = 3)
effect$plot(rug = FALSE)
The slow functions are FeatureEffect$new and effect$plot.
I programmed it myself with data.table and was able to make it around 10 times faster. In my code I just replaced the variable in data.table and predicted on this new data.
We found out, that it is needed to change the batch.size.
This should be maybe set to a different default.
predictor = Predictor$new(learner, data = task_x, y = task_y, batch.size = 50000)