The `switch` argument in `DropHighPSIFeatures` should be removed
Describe the bug
In DropHighPSIFeatures it is stated that the PSI metric is not symmetric, and hence a boolean argument switch is introduced. The PSI however is symmetric by definition, so this argument should be removed.
PSI(a, b) = PSI(b, a) by definition, as it is a symmetrized version of the KL divergence. See p. 85 of Kullback & Leibler 1951.
Thanks for raising this issue adn linking the resource. I'd have to read the paper before making a decision.
@gverbock @glevv are you perhaps more familiar than me on this topic?
Yes, this is correct. PSI is one of the symmetrical forms of KLD, so there is no need in switch argument.
To compute the PSI you need to define bins (for example based on the decile of the population). In PSI(a,b), a is selected to define the bins. Using the switch argument, b will be selected. If a and b have different distributions, I expect PSI(a,b) will not be equal to PSI(b,a) because the binning is different. Give me a couple of days to think about it. Happy to remove the switch if it has no impact.
I see.
In this case, we may need to re-word the docs and user guide, because we are saying that the PSI is not symmetric, and if I understand correctly, what's not symmetric is how we calculate it due to the binning.
I'll take a look at the original article when I get the change and give it another thought.
Thank you so much for your contributions!
In that case the wording is misleading. The PSI metric is defined to be symmetric. In the implementation here you make the decision of binning with respect to a (and hence the symmetry breaks). I propose you remove the argument altogether as it makes it seem that the PSI is asymmetric. Instead, explain the binning process and the implications of it.
But if the formula implemented by the estimator is asymmetric, because of the binning, then it makes sense to keep the switch argument imo. Perhaps the explanation of the binning formula, and the asymmetry property, should be added more clearly to the docstring.