Fix repeated features for OneHotEncoder and PolynomialFeatures
Hi @solegalli,
This is my shot for fixing #489.
After checking the code in detail I think the issue affects not only PolynomialFeatures but also Sklearn OneHotEncoder.
When using SklearnTransformerWrapper + OneHotEncoder I get this:

Which is totally unexpected, since the idea is to replace Categorical Features for OneHotEncoded ones.
After the fix I'm proposing I get this for OneHotEncoder

Which I think is the expected behavior.
For PolynomialFeatures I get the following:

Please note that an OrdinalEncoder was applied to Categorical Variables before applying PolynomialFeatures, otherwise it throws an error since they are not numerical features.
I think these cases could be added as test cases if you think it's OK. But before I would love to get your feedback.
Best,
Alfonso
Hi @datacubeR
Thanks a lot for the changes.
I do agree that the idea is to replace the original categorical variables by the one hot encoded ones. So I guess, it is OK to have them removed from the dataset as you did.
Originally I thought, if that is the intended functionality, then why not use the OHE from feature engine instead of the one from sklearn? that is why I decided not to drop them. But I think it makes sense to go ahead with your suggestion.
I think it would be great to have those code snippets as tests, and also maybe in the user guide: https://feature-engine.readthedocs.io/en/latest/user_guide/wrappers/Wrapper.html#sklearn-wrapper (I mean examples of wrapping the OHE and the PolynomialFeatures)
would you be able to do that?
thanks a lot!
PS: I fixed the failing tests in the main branch last week. It was something related to the boxcox with the latest version of scipy. If you sync your main branch and then rebase it to this feature branch, it should get those sorted @datacubeR
thank you!
@solegalli I think my fork is up to date, but the failing tests is because they considering adding Features rather than replacing the existing feature for the transformed ones. I will fix those and I'll get back to you.
@solegalli do I need to create a separate PR for the examples in the User Guide or I just include those in this one?
parate PR for the exampl
Here would be good :)
Thank you!
It's looking good.
I think we need to add a test to corroborate that the method get_feature_names_out returns the original features, without the ones used in the transformation, plus the new ones, when it is called without passing the input_feature parameters.
so tr_wrap.get_feature_names_out() should return = feature_names_in - self.variables_ + new features
Could you add a test for that please? Or is it already there? I mean just for the Poly and OHE
Hey @datacubeR
This is almost ready. Some small changes here and there. Would you be able to have a look?
Thank you!
Hi @datacubeR
Thank you for looking into the get_feature_names_out() issue. I think we can ignore that request then.
Let me know when this is good to go.
Thanks a lot
Hi @solegalli, sorry for the delay. I updated my OS and had to reinstall lot of things. I just pushed all the last observations for your review. Please let me know if everything looks ok.
Best,
Alfonso
Thank you @datacubeR !!
Great fix.