feature_engine icon indicating copy to clipboard operation
feature_engine copied to clipboard

feat: Random Noise Selection method

Open TremaMiguel opened this issue 4 years ago • 10 comments

Is your feature request related to a problem? Please describe. Filtering Noisy features from a set of features can be easily accomplished by adding a one or more random variables to train set, train a gradient boosting or linear model and compute the feature importance.

When a feature is below the importance of the random noise probably that feature is not so important.

Reference

Describe the solution you'd like

A NoiseFilterSelection method that trains a gradient boosting model (sklearn is a required dependency already) and returns only the features with importance above the random variable.

TremaMiguel avatar Jul 17 '21 15:07 TremaMiguel

If a feature is not important, the decision trees would already return a low importance value. And then we could select features using SelectFromModel for example from sklearn.

I guess the value from adding "random features" would be that it would allow us to distinguish which features add "any" value other than random. Is this the overall point?

How much value does this selection method have over already existing methods? like the SelectFromModel from sklearn using for example random forests, or the select by single feature performance that we have in Feature-engine?

solegalli avatar Jul 18 '21 08:07 solegalli

Regarding

I guess the value from adding "random features" would be that it would allow us to distinguish which features add "any" value other than random. Is this the overall point?

Taking as baseline a feature with no information (random variable) we want to filter those variable that do contribute information to predict the target. For this, we find the features that are more useful to predict the target variable through the model's feature importance.

The main difference with SelectFromModel and SelectBySingleFeaturePerformance is that user do not need to provide a threshold for filtering the features because the random variable feature importance act as the threshold. Also, it is a quick way (only train model once) to get an starting point for which variables help you predict the target.

TremaMiguel avatar Jul 19 '21 14:07 TremaMiguel

that is an interesting point. Let's try and see if there is some reference other than the Kaggle video and attach to this issue.

solegalli avatar Jul 21 '21 12:07 solegalli

@solegalli other references:

  1. Feature Selection beyond Feature Importance. This article talks about adding three different kind of random features
  • Binary random feature ( 0 or 1)
  • Uniform between 0 to 1 random feature
  • Integer random feature

and select only the ones with feature importance above the random variables.

  1. satra. This is an implementation with fit and transform methods from the github user satra, the difference is that

Keep regenerating a noise vector as long as the noise vector is more than 0.05 correlated with the output.

  1. Feature Importance by example of a Random Forest. General exercise explaining the feature importance through different methods with the additions of a random variable.

As there's already some code available, I think this could be quick to implement.

TremaMiguel avatar Jul 21 '21 13:07 TremaMiguel

Hi @TremaMiguel

Thanks for adding the additional references.

I spent myself a good 90 minutes this morning searching for more information about this technique, and I did not find anything other than the kdnuggetsblog and the talk by Gilberto that you linked to this issue.

The technique makes sense thought, so happy to move forward.

Do you have any experience with this method? would adding a binary random or uniform random have any advantage over adding just a normal distributed random feature?

solegalli avatar Jul 22 '21 04:07 solegalli

This is an ok FS method, i just don't think it should be used as step in pipeline as done in satra and should be more of a model examination tool.

NoiseFeatureSelection(
    model (with feature_importance_ or coef_ attr),
    type of noise (uniform, normal, binary etc),
    n_iter,
    seed
) -> tuples of (features names, times (or frequencies) it was worse than noise)

This will allow user to inspect specific features and try to understand why it is not very useful for selected model, instead of being just a drop-in grey-box transformer.

glevv avatar Feb 09 '22 10:02 glevv

Found the original article: https://jmlr.org/papers/volume3/stoppiglia03a/stoppiglia03a.pdf

Ranking a Random Feature for Variable and Feature Selection by Stoppiglia, Dreyfus et al.

Also discussed in Guyon 2003 and used in Bi et al 2003.

solegalli avatar Jun 20 '22 13:06 solegalli

As per @GLevV 's comment, the authors themselves argue that the method of adding probe features (what we call here noise features) is aimed to understand which features have some sort of relation to the target, but not necessarily to improve model performance.

stoppiglia

In other words, better suited for model interpretability than for feature selection

solegalli avatar Jun 20 '22 13:06 solegalli

Hey @saurabhgoel1985

This is a fairly straightforward feature selection transformer, if you fancy making a contribution.

The idea is, introduce 3 random features in the data, train a machine learning model with all features plus the 3 random ones, and select all features which importance is greater than the random ones (or mean of the 3 random features).

We need to create a new class called ProbeFeatureSelection located within the selection module. I'd recommend looking at the recursive feature elimination class as a template.

solegalli avatar Aug 15 '22 13:08 solegalli

Hey @saurabhgoel1985

This is a fairly straightforward feature selection transformer, if you fancy making a contribution.

The idea is, introduce 3 random features in the data, train a machine learning model with all features plus the 3 random ones, and select all features which importance is greater than the random ones (or mean of the 3 random features).

We need to create a new class called ProbeFeatureSelection located within the selection module. I'd recommend looking at the recursive feature elimination class as a template.

Sure @solegalli let me have a look and will get back to you for any doubts. Thanks a lot for tagging me here.

saurabhgoel1985 avatar Aug 15 '22 15:08 saurabhgoel1985

@solegalli @saurabhgoel1985,

I don't see a PR or this issue referenced in another issue. I'm happy to work on it!

Please confirm that no work has ben started. Thanks!

Morgan-Sell avatar Dec 06 '22 18:12 Morgan-Sell

It is all yours if you want it @Morgan-Sell :)

As in, there is no work on it yet.

Thanks a lot!

solegalli avatar Dec 07 '22 10:12 solegalli

Yay! I'll take it. I'll focus on this issue and issue #571.

Morgan-Sell avatar Dec 07 '22 22:12 Morgan-Sell