opacus icon indicating copy to clipboard operation
opacus copied to clipboard

`BatchSplittingSampler` return wrong length

Open dwahdany opened this issue 2 years ago • 2 comments

🐛 Bug

BatchSplittingSampler reports the length as

expected_batch_size = self.sampler.sample_rate * self.sampler.num_samples
return int(len(self.sampler) * (expected_batch_size / self.max_batch_size))

Converting the result simply to int leads to the resulted number of batches being one too low. Instead, we need to ceil the result first:

expected_batch_size = self.sampler.sample_rate * self.sampler.num_samples
return int(np.ceil(len(self.sampler) * (expected_batch_size / self.max_batch_size)))

Some libraries like pytorch lightning will skip the last batch if this is reported wrong, resulting in no actual step occuring at all.

dwahdany avatar Mar 22 '24 18:03 dwahdany

Thanks for contributing to Opacus. Will take a look.

HuanyuZhang avatar Apr 24 '24 14:04 HuanyuZhang

Thanks for contributing to Opacus. Will take a look.

Have you had the time to look into this? It seems like a straightfoward fix.

dwahdany avatar May 07 '24 19:05 dwahdany

Thanks for contributing to Opacus! The fix makes sense to me. Just a qq: what is the point of using int(np,ceil())? How about just using math.ceil()?

HuanyuZhang avatar May 13 '24 01:05 HuanyuZhang

Thanks for contributing to Opacus! The fix makes sense to me. Just a qq: what is the point of using int(np,ceil())? How about just using math.ceil()?

Thanks for pointing that out. Bad habit I guess, math.ceil is better. I changed it to math.ceil

dwahdany avatar May 17 '24 14:05 dwahdany

This patch is still computing incorrectly the expected number of batches. See #516.

s-zanella avatar May 21 '24 14:05 s-zanella