CTGAN icon indicating copy to clipboard operation
CTGAN copied to clipboard

Optimize batch size

Open csala opened this issue 4 years ago • 0 comments

Problem Description

Currently (and even after #135) is resolved, the last batch from the dataset loader is dropped if it is shorter than the batch size, potentially resulting in dropping a considerable portion of the dataset.

For example, if a dataset has 999 rows and the batch size is 500, 499 rows are being currently dropped.

Expected behavior

We should think about a way to optimize the batch size to ensure that we drop the minimum number of rows possible, while still trying to get as close as possible to the specified batch size.

We may possibly consider adding a boolean optimize_batch_size argument for it.

csala avatar Mar 10 '21 18:03 csala