CTGAN
CTGAN copied to clipboard
Optimize batch size
Problem Description
Currently (and even after #135) is resolved, the last batch from the dataset loader is dropped if it is shorter than the batch size, potentially resulting in dropping a considerable portion of the dataset.
For example, if a dataset has 999 rows and the batch size is 500, 499 rows are being currently dropped.
Expected behavior
We should think about a way to optimize the batch size to ensure that we drop the minimum number of rows possible, while still trying to get as close as possible to the specified batch size.
We may possibly consider adding a boolean optimize_batch_size argument for it.