opacus `BatchSplittingSampler` return wrong length

🐛 Bug

BatchSplittingSampler reports the length as

expected_batch_size = self.sampler.sample_rate * self.sampler.num_samples
return int(len(self.sampler) * (expected_batch_size / self.max_batch_size))

Converting the result simply to int leads to the resulted number of batches being one too low. Instead, we need to ceil the result first:

expected_batch_size = self.sampler.sample_rate * self.sampler.num_samples
return int(np.ceil(len(self.sampler) * (expected_batch_size / self.max_batch_size)))

Some libraries like pytorch lightning will skip the last batch if this is reported wrong, resulting in no actual step occuring at all.

Mar 22 '24 18:03 dwahdany

Thanks for contributing to Opacus. Will take a look.

Apr 24 '24 14:04 HuanyuZhang

Thanks for contributing to Opacus. Will take a look.

Have you had the time to look into this? It seems like a straightfoward fix.

May 07 '24 19:05 dwahdany

Thanks for contributing to Opacus! The fix makes sense to me. Just a qq: what is the point of using int(np,ceil())? How about just using math.ceil()?

May 13 '24 01:05 HuanyuZhang

Thanks for contributing to Opacus! The fix makes sense to me. Just a qq: what is the point of using int(np,ceil())? How about just using math.ceil()?

Thanks for pointing that out. Bad habit I guess, math.ceil is better. I changed it to math.ceil

May 17 '24 14:05 dwahdany

This patch is still computing incorrectly the expected number of batches. See #516.

May 21 '24 14:05 s-zanella