themis
themis copied to clipboard
Extra recipes steps for dealing with unbalanced data
## The problem I'm working with a dataset where I use importance weights to specify the misclassification costs of instances. Because the target class in the dataset is severly unbalanced,...
This is a just a general problem in this package This should also include: - a documentation page about how `over_ratio` and `under_ratio` works
Do they specify the same thing, but with different values? Is "over-sampling the minority class proportion" the same functionality as "ratio of the majority-to-minority frequencies"? And shouldn't it be minority-to-majority...
Hello! Can we use themis to deal with imbalance data in regression tasks (numeric response)?
## Feature Maybe synthetic data is slightly out of scope, but GANs seem like a worthwhile addition to `themis` based on this article: https://towardsdatascience.com/the-new-step-forward-in-synthetic-data-dc854319166d There are a couple R packages...
It became a problem in this case where I couldn't get consistent results even with seeds. https://github.com/tidymodels/tidymodels.org/tree/main/content/learn/models/sub-sampling
References: https://github.com/dalpozz/unbalanced/blob/master/R/ubNCL.R https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.NeighbourhoodCleaningRule.html
References: https://github.com/dalpozz/unbalanced/blob/master/R/ubENN.R https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.EditedNearestNeighbours.html
References: https://github.com/dalpozz/unbalanced/blob/master/R/ubOSS.R https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.OneSidedSelection.html
References: https://github.com/dalpozz/unbalanced/blob/master/R/ubCNN.R https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.CondensedNearestNeighbour.html