opacus icon indicating copy to clipboard operation
opacus copied to clipboard

Invalid combnation of arguments related to empty batch

Open SoumiDas opened this issue 1 year ago • 3 comments

Hi,

I have been trying doing a DP based finetuning on a dataset using Pythia 1B model. I receive the following error at epoch 5 when I Increase the dataset size to around 1000.

TypeError: zeros() received an invalid combination of arguments - got (tuple, dtype=type), but expected one of:

  • (tuple of ints size, *, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
  • (tuple of ints size, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

This is arising from line 60-61 of opacus/data_loader.py which checks if len(batch) > 0 and tries to collate. Where am I going wrong or what can be the workaround to it?

Please help!

P.S. The configrations I use are number of epochs = 5, training set = 1000, batch size 8, and I am using BatchManager with max_physical_batch_size as 8.

SoumiDas avatar Sep 21 '24 14:09 SoumiDas

Are you using Poisson subsampling ? Also, can you use the bug report Colab so that we look at your code and reproduce the issue?

EnayatUllah avatar Sep 22 '24 22:09 EnayatUllah

@SoumiDas Looks like a problem with batch size although it looks very strange. I was facing the exact same issue with batch size 8. Later, I changed the batch size to 12 and the problem was resolved.

kanchanchy avatar Oct 13 '24 20:10 kanchanchy

@kanchanchy, thanks for sharing this! I tried to reproduce the issue with the setting mentioned in the post: number of epochs = 5, training set = 1000, batch size = max_physical_batch_size = 8, with BatchMemoryManager (with noise_multiplier = 0.1 and max_grad_norm =1.0), on a toy dataset and model, but I wasn't able to.

Its possible that I am missing something so it would be great if one of you (@kanchanchy or @SoumiDas) could reproduce this in the bug report Colab, thanks!

EnayatUllah avatar Oct 15 '24 17:10 EnayatUllah

Close the issue due to no response. Feel free to re-open it.

HuanyuZhang avatar Jun 20 '25 12:06 HuanyuZhang

It seems that the DPDataLoader might not construct the correct sample_empty_shapes and dtypes for empty batches when training models over HF datasets in a {'input_ids': [], 'attention_mask':[], 'labels':[]} format.

The error seems to arise from lines 60-63 in https://github.com/pytorch/opacus/blob/main/opacus/data_loader.py.

if len(batch) > 0:
    return collate_fn(batch)
else:
    return [
        torch.zeros(shape, dtype=dtype)
        for shape, dtype in zip(sample_empty_shapes, dtypes)
    ]

This is likely because of lines 190-191

sample_empty_shapes = [(0, *shape_safe(x)) for x in dataset[0]]
dtypes = [dtype_safe(x) for x in dataset[0]]

I could create a notebook to reproduce this in a few days if that might help.

kr-ramesh avatar Jul 15 '25 23:07 kr-ramesh