permute_cases: Error arguments imply differing number of rows: 30000, 0
Dear lime contributors, thanks for your awesome work on this repository. Alas, I got an error that took me several days to figure out, and is reproducible:
explanation.lime <- lime::explain(
x = local.obs,
explainer = explainer.lime,
n_features = 5
)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 30000, 0
Fortunately I reached a point that I not only could narrow down the location of the source code but also the conditions that trigger it - but not completely, so I hope you figure out the last mile.
The condition that triggers it is a column in the cases argument of permute_cases that has zero variance and is integer, in my case it is column reviews.numHelpful
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 13 variables:
$ reviews.doRecommend: Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2
$ reviews.numHelpful : int 0 0 0 0 0 0
$ reviews.rating : int 4 4 4 5 5 5
$ anger : num 0 0 0 0 0 0
This column leads to an empty output within the permute_cases.data.frame function in the lines identfying the "bin" ifelse statement:
} else if (is.numeric(cases[[i]]) && bin_continuous) {
bin <- sample(seq_along(feature_distribution[[i]]), nrows, TRUE, as.numeric(feature_distribution[[i]]))
diff(bin_cuts[[i]])[bin] * runif(nrows) + bin_cuts[[i]][bin]
}
which can be seen here:
$ : Factor w/ 2 levels "1","2": 1 2 1 2 2 2 2 1 2 1 ...
$ : int(0)
$ : int [1:30000] 14 5 5 19 31 10 27 7 10 10 ...
$ : num [1:30000] 0.021654 0.081145 0.039533 0.000972 0.029057 ...
I disentangled the type conversion to dataframe and thus found that this throws the above error:
perms <- as.data.frame(perms, stringsAsFactors = FALSE)
The feature_distribution[[2]] gives:
FALSE TRUE
0.04648887 0.95351113
This is wrong! This result should come from the only factor, i.e. the first column and thus rendered by feature_distribution[[2]]!
Consequently, the next line diff(bin_cuts[[2]])[bin] always returns NULL which leads to an empty return value integer(0)
So far, I could narrow the root cause to this point - but I am clueless what diff(bin_cuts[[2]])[bin] means and how this can be prevented.
Update
I found a potential reason for this apparent index problem.
The feature distribution includes the target variable .outcome as first list item, and thus all indeces are wrong by offset 1:
$.outcome
1 2
0.3277057 0.6722943
$reviews.doRecommend
FALSE TRUE
0.04648887 0.95351113
$reviews.numHelpful
1 2 3 4
0.9981241334 0.0012233912 0.0001631188 0.0004893565
$anger
1 2 3 4
0.911100237 0.065900008 0.013620422 0.009379333
However, the target variable is inevitable because the documentation for ?lime specifies:
x The training data used for training the model that should be explained.
So the training data (including the target), not the features (excluding the target) must be fed into lime::lime(). Now I wonder:
Is this a problem inlime::lime() or permutate_cases()??
Can you fix this?? Tricky...