Hleb Levitski
Hleb Levitski
Could be connected to addition of `max_samples` feature that generates several int arrays of shape (N, 1), but I'm not sure
Could be connected to #8361 and calculating `var` on entire dataset when `gamma=scale`. If true, I don't think there is anything that could be done
If DT has random seed locked, it will always return the same values, thus we can use something like `stats.rankdata(x, method='dense')` to transform floats to integers.
From the looks of it, it is the same as `sklearn.preprocessing.KBinsDiscretizer` with `strategy='kmeans'`
@solegalli sorry, my bad, I misunderstood OP. It is transformation of k features to 1. So it just regular fit_predict of KMeans on subset of data. Which is something like...
Yes, it's a rerun of a #84 If someone has an implementation of this, it should be tested on OOM issues as well as speed first, since there could be...
We can translate multipliers to percentiles. For example 2 and 3 fold for gaussian method is a 2-sigma, 3-sigma rules which is 95.5% and 99.7% respectively ([wiki article if someone...
Ok, I will make draft PR soon
@gverbock yes, threshold will depend on size of dataset (and on `split_frac`) and number of bins, which makes sense since larger number of bins would be more sensitive to changes...
@solegalli ye, why not