feature_engine
feature_engine copied to clipboard
Feature engineering package with sklearn like functionality
Many transformers in feature engine require that y is binary. At the moment we do this check within each transformer. We should create a function that unifies this behaviour and...
At the moment, we have no visibility of test coverage. I would like to include this in the package. Here some useful links I gathered: https://gist.github.com/dnozay/5b9b818ff0dc857d1358 https://pytest-cov.readthedocs.io/en/latest/tox.html https://coverage.readthedocs.io/en/6.1.1/config.html Also check...
Information value is a variable-selection method that is used with binary classifiers. Information value summarizes how much knowing `var_A` helps in predicting the dependent variable. feature-engine includes a WoE Encoder....
Add missing_only functionality to all imputers to use in combination with variables=None When variables is None, the imputers select all numerical, or categorical or all variables by default. With the...
Holte 1993. More details coming soon
Closes #107 Notes from #107: New variables are created by combination of user indicated variables with decision trees. Example: if user passes 3 variables to transformer, a new feature will...
Closes #450. Notes from #450: Existing implementations: https://github.com/lisette-espin/pychimerge https://github.com/night18/ChiMerge https://github.com/raiyan1102006/ChiMerge https://gist.github.com/alanzchen/17d0c4a45d59b79052b1cd07f531689e?short_path=f2e54c6 Reference to the original article can be found in the first link
Some transformers such as the `CountFrequency` categorical encoder have options for handling new unseen categories during calls to `transform`. In some cases, unseen categories and explicit missing values might be...
**Reference Issues/PRs**: Issue : #429 (Add functionality for threshold) **What does this implement/fix?** Adds functionality to group categories with few observations. Hi, @solegalli I've added functionality for Backoff binning, threshold...
We can use the functionality of the target mean classifiers or regressor probably.