miceforest
miceforest copied to clipboard
Multiple Imputation with LightGBM in Python
Hi @AnotherSamWilson, The related issue is the fact that for large dataset (my case at least) the imputed values for the same dataset differ before (trained kernel) and after (that...
https://github.com/siboehm/lleaves
Right now random state management is all over the place, and inelegant, especially inside the default mean matching functions. See if most random processes can be switched over to use...
lightgbm can handle the following. No reason we can't add: H2O DataTable's Frame scipy.sparse Look into lightgbm.Sequence... might be no point.
Actually storing the latest imputation values can take up a lot of memory. We have all the information we need to generate imputation values when complete_data() is called, why not...
Getting this reliably. Chased it down to scipy.Spatial.KDtree. Changing leafsize doesn't help. candidate_preds is float64, shape (294695, 1).
For correlations, could be the % of matching imputed categories. For distributions, could be a bar/boxplot (depending on datasets?) of the histograms values.
Have had good experience with Bayesian optimization in the past. Lightweight implementation: https://github.com/fmfn/BayesianOptimization
Hello, Hope you are doing well!! I was working with MiceForest on a toy dataset to understand how it works. And, during that came across "LinAlgError: singular matrix" while generating...