feature_engine MAD-Median rule for outlier removal

Is your feature request related to a problem? Please describe. We can add outlier rule similar to gaussian (mean +- k * std) but using robust statistics (median +- k * mad). It is very often get named in statistical books (for example Introduction to Robust Estimation and Hypothesis Testing by R.R. Wilcox), so I'm surprised it is not here.

Describe the solution you'd like Add mad option to base_outlier

P.S. There is a way to ditch fold parameter and has one general parameter that will has the same range and meaning for all methods, but a lot of code/docs/test would be rewritten.

Aug 12 '22 14:08 glevv

Go for it.

What do you have in mind as substitute of the fold parameter?

Aug 13 '22 11:08 solegalli

We can translate multipliers to percentiles. For example 2 and 3 fold for gaussian method is a 2-sigma, 3-sigma rules which is 95.5% and 99.7% respectively (wiki article if someone interested). As for IQR 1.5 is approx 99.3% (this medium article gives simpler explanation). And so on. This way we can generalize all this multipliers to one percentile parameter that will control what percentage of observations (approximately) will be treated as non-outliers (or maybe the inverse, if put 1-p). It is not that hard to calculate multiplier based on this parameter.

A lot of code and tests should be rewritten for it, though.

P.S. I actually have a draft, but it is in early stages and not tested. As for MAD rule I will make PR soon.

Aug 13 '22 15:08 glevv

Thank you @GLevV for so much detail.

The thing is, 1.5 and 3 are values that are commonly used for the IQR, and transforming that into percentiles would lose the meaning/easy interpretability of those values.

And the same is true for the normal distribution. The number of sigmas away from the mean are very standardized values. Transforming that into percentiles means a lot more detail in the docs to let the users know what they are doing. I think it becomes less user friendly.

So I am happy to add the MAD limits, but I would not change the current logic of the fold parameter for now.

Aug 14 '22 06:08 solegalli

Ok, I will make draft PR soon

Aug 14 '22 09:08 glevv