Samuel Wilson

Results 9 issues of Samuel Wilson

Right now random state management is all over the place, and inelegant, especially inside the default mean matching functions. See if most random processes can be switched over to use...

enhancement

lightgbm can handle the following. No reason we can't add: H2O DataTable's Frame scipy.sparse Look into lightgbm.Sequence... might be no point.

enhancement

Actually storing the latest imputation values can take up a lot of memory. We have all the information we need to generate imputation values when complete_data() is called, why not...

enhancement

Getting this reliably. Chased it down to scipy.Spatial.KDtree. Changing leafsize doesn't help. candidate_preds is float64, shape (294695, 1).

bug

For correlations, could be the % of matching imputed categories. For distributions, could be a bar/boxplot (depending on datasets?) of the histograms values.

enhancement

Have had good experience with Bayesian optimization in the past. Lightweight implementation: https://github.com/fmfn/BayesianOptimization

enhancement

Right now it is set to the size of the data. If the data is even remotely large (10000 rows), this will cause Python to run out of memory.

Feature groups save a huge amount of time if they are sensible by improving slicing and making coalition combinations smaller.

mice and mitml allow analysis pooling, look into adding this functionality.

enhancement