feat: Random Noise Selection method
Is your feature request related to a problem? Please describe. Filtering Noisy features from a set of features can be easily accomplished by adding a one or more random variables to train set, train a gradient boosting or linear model and compute the feature importance.
When a feature is below the importance of the random noise probably that feature is not so important.
Describe the solution you'd like
A NoiseFilterSelection method that trains a gradient boosting model (sklearn is a required dependency already) and returns only the features with importance above the random variable.
If a feature is not important, the decision trees would already return a low importance value. And then we could select features using SelectFromModel for example from sklearn.
I guess the value from adding "random features" would be that it would allow us to distinguish which features add "any" value other than random. Is this the overall point?
How much value does this selection method have over already existing methods? like the SelectFromModel from sklearn using for example random forests, or the select by single feature performance that we have in Feature-engine?
Regarding
I guess the value from adding "random features" would be that it would allow us to distinguish which features add "any" value other than random. Is this the overall point?
Taking as baseline a feature with no information (random variable) we want to filter those variable that do contribute information to predict the target. For this, we find the features that are more useful to predict the target variable through the model's feature importance.
The main difference with SelectFromModel and SelectBySingleFeaturePerformance is that user do not need to provide a threshold for filtering the features because the random variable feature importance act as the threshold. Also, it is a quick way (only train model once) to get an starting point for which variables help you predict the target.
that is an interesting point. Let's try and see if there is some reference other than the Kaggle video and attach to this issue.
@solegalli other references:
- Feature Selection beyond Feature Importance. This article talks about adding three different kind of random features
- Binary random feature ( 0 or 1)
- Uniform between 0 to 1 random feature
- Integer random feature
and select only the ones with feature importance above the random variables.
- satra. This is an implementation with fit and transform methods from the github user satra, the difference is that
Keep regenerating a noise vector as long as the noise vector is more than 0.05 correlated with the output.
- Feature Importance by example of a Random Forest. General exercise explaining the feature importance through different methods with the additions of a random variable.
As there's already some code available, I think this could be quick to implement.
Hi @TremaMiguel
Thanks for adding the additional references.
I spent myself a good 90 minutes this morning searching for more information about this technique, and I did not find anything other than the kdnuggetsblog and the talk by Gilberto that you linked to this issue.
The technique makes sense thought, so happy to move forward.
Do you have any experience with this method? would adding a binary random or uniform random have any advantage over adding just a normal distributed random feature?
This is an ok FS method, i just don't think it should be used as step in pipeline as done in satra and should be more of a model examination tool.
NoiseFeatureSelection(
model (with feature_importance_ or coef_ attr),
type of noise (uniform, normal, binary etc),
n_iter,
seed
) -> tuples of (features names, times (or frequencies) it was worse than noise)
This will allow user to inspect specific features and try to understand why it is not very useful for selected model, instead of being just a drop-in grey-box transformer.
Found the original article: https://jmlr.org/papers/volume3/stoppiglia03a/stoppiglia03a.pdf
Ranking a Random Feature for Variable and Feature Selection by Stoppiglia, Dreyfus et al.
Also discussed in Guyon 2003 and used in Bi et al 2003.
As per @GLevV 's comment, the authors themselves argue that the method of adding probe features (what we call here noise features) is aimed to understand which features have some sort of relation to the target, but not necessarily to improve model performance.
In other words, better suited for model interpretability than for feature selection
Hey @saurabhgoel1985
This is a fairly straightforward feature selection transformer, if you fancy making a contribution.
The idea is, introduce 3 random features in the data, train a machine learning model with all features plus the 3 random ones, and select all features which importance is greater than the random ones (or mean of the 3 random features).
We need to create a new class called ProbeFeatureSelection located within the selection module. I'd recommend looking at the recursive feature elimination class as a template.
Hey @saurabhgoel1985
This is a fairly straightforward feature selection transformer, if you fancy making a contribution.
The idea is, introduce 3 random features in the data, train a machine learning model with all features plus the 3 random ones, and select all features which importance is greater than the random ones (or mean of the 3 random features).
We need to create a new class called
ProbeFeatureSelectionlocated within the selection module. I'd recommend looking at the recursive feature elimination class as a template.
Sure @solegalli let me have a look and will get back to you for any doubts. Thanks a lot for tagging me here.
@solegalli @saurabhgoel1985,
I don't see a PR or this issue referenced in another issue. I'm happy to work on it!
Please confirm that no work has ben started. Thanks!
It is all yours if you want it @Morgan-Sell :)
As in, there is no work on it yet.
Thanks a lot!
Yay! I'll take it. I'll focus on this issue and issue #571.