orange3 icon indicating copy to clipboard operation
orange3 copied to clipboard

Predictions refuses features from test set with incorrect variable names if renamed to the correct variable name

Open wvdvegte opened this issue 3 years ago • 4 comments

What's wrong? If Predictions receives inputs from a test set where features have variable names different from the variable names in the training data, it won't recognize the features, and predictors will fail This is as expected. However, using Edit Domain it should be possible to change the feature variable name to the 'correct' name (same as in training set), for Predictions to recognize the feature. However, it doesn't. Somehow, it still seems to 'see' the unchanged, 'wrong' variable name. On the other hand, if the feature has the 'correct' name in the test set, but it is changed to a 'wrong' name using Edit Domain, and, again, back to the 'correct' name, Predictions will recognize it as the intended feature.

How can we reproduce the problem? See attached workflow (using URLs for input files), testing several options

What's your environment?

  • Operating system: Mac OS 12.4 (on Silicon)
  • Orange version: 3.32.0
  • How you installed Orange: from DMG Predictions bug.ows.zip

wvdvegte avatar Jun 05 '22 13:06 wvdvegte

This is not a bug, it is by design. In the background, Orange uses compute_value, a function whose goal is to transform any new rows into an appropriate form with the same approach. This is particularly useful in Test and Score and Predictions, where one doesn't have to apply any transformation to the test data (everything is done automatically).

We have already foreseen the issue you are having. In Edit Domain, simply check "unlink variable from its source variable", which will remove its compute_value, thus enabling you to compare variables by name only (not by their inherent similarity).

ajdapretnar avatar Jun 06 '22 06:06 ajdapretnar

In Edit Domain, simply check "unlink variable from its source variable"

@ajdapretnar Thanks for the explanation. I'd like to do that, but that option is greyed out ....

wvdvegte avatar Jun 06 '22 08:06 wvdvegte

I am reopening this one. I know Orange is strict about variable reuse, but this case is different. So here the user had a raw variable and renamed it, which still means they should have a functionally raw (but renamed) variable. So I think the matches should indeed be possible in their case and not allowing them is a bug.

markotoplak avatar Jun 06 '22 08:06 markotoplak

We tried it: this problem could be solved if Edit Domain allowed unlinking of renamed variables. This widgets needs to be changed in at least two places: it should no longer disable the checkbox, and requires_unlinking should not check whether there are any transformations that might add compute_value (because this test does not seem to work properly?).

The checkbox has a decent tooltip, but it may have to also include a tip that one can rename variables in that fashion. Perhaps also include why you may not want to unlink -- the "history" of changes (e.g. discretization) is lost so the model may not properly apply to new data.

janezd avatar Jun 10 '22 09:06 janezd