Miguel Trejo Marrufo
Miguel Trejo Marrufo
Hi @solegalli, regarding your points: **1.** The transformer by default could encode categorical variable to an ordinal representation or any other transformation , [here](https://github.com/nicodv/kmodes/blob/dab08c094a83ec247d9ff1cac652e607a234b3e7/kmodes/kmodes.py#L210) is an example from Kmodes. But,...
The book [Feature Engineering for Machine Learning](https://www.amazon.com/-/es/Alice-Zheng/dp/1491953241) describe the method as **K-means Featurization**, it already provides some code example. ```python import numpy as np >>> from sklearn.cluster import KMeans >>>...
For the MI calculation, I'm basing on [sklearn.feature_selection.mutual_info_regression](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_regression.html). Basically uses [scipy.special.digamma](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.digamma.html) which is in log scale. In theory, there would be no upper bound for this measure. For two continuous...
The MI can detect non-linear, sinusoidal and linear relationships as well. [Here](https://www.kaggle.com/ryanholbrook/mutual-information) is a better explanation from a Kaggle Tutorial.
Yes, that's a good point, from what I know. Relations above 1 are periodic. ```python import numpy as np import matplotlib.pyplot as plt from sklearn.feature_selection import mutual_info_regression n = 100...
Regarding > On the downside, not having the threshold check, may encourage users to use any value, so for users less familiar with correlation, we would not be guiding them,...
Hi @solegalli, these are good points to consider. I resume my answer in the following points. **1.** I think the transformer should not limit the arguments that `pandas.DataFrame.groupby` already support...
@solegalli is a good idea to let user decide wether or not to drop the original `agg_vars`. As the user has the control over what to agreggate, that is which...
I can take the dev of this issue, we should be able to specify in the contribution guidelines to install and activate `pre-commit` to automatically check code issues. This could...
Regarding > I guess the value from adding "random features" would be that it would allow us to distinguish which features add "any" value other than random. Is this the...