Invalid combnation of arguments related to empty batch
Hi,
I have been trying doing a DP based finetuning on a dataset using Pythia 1B model. I receive the following error at epoch 5 when I Increase the dataset size to around 1000.
TypeError: zeros() received an invalid combination of arguments - got (tuple, dtype=type), but expected one of:
- (tuple of ints size, *, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
- (tuple of ints size, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
This is arising from line 60-61 of opacus/data_loader.py which checks if len(batch) > 0 and tries to collate. Where am I going wrong or what can be the workaround to it?
Please help!
P.S. The configrations I use are number of epochs = 5, training set = 1000, batch size 8, and I am using BatchManager with max_physical_batch_size as 8.
Are you using Poisson subsampling ? Also, can you use the bug report Colab so that we look at your code and reproduce the issue?
@SoumiDas Looks like a problem with batch size although it looks very strange. I was facing the exact same issue with batch size 8. Later, I changed the batch size to 12 and the problem was resolved.
@kanchanchy, thanks for sharing this! I tried to reproduce the issue with the setting mentioned in the post: number of epochs = 5, training set = 1000, batch size = max_physical_batch_size = 8, with BatchMemoryManager (with noise_multiplier = 0.1 and max_grad_norm =1.0), on a toy dataset and model, but I wasn't able to.
Its possible that I am missing something so it would be great if one of you (@kanchanchy or @SoumiDas) could reproduce this in the bug report Colab, thanks!
Close the issue due to no response. Feel free to re-open it.
It seems that the DPDataLoader might not construct the correct sample_empty_shapes and dtypes for empty batches when training models over HF datasets in a {'input_ids': [], 'attention_mask':[], 'labels':[]} format.
The error seems to arise from lines 60-63 in https://github.com/pytorch/opacus/blob/main/opacus/data_loader.py.
if len(batch) > 0:
return collate_fn(batch)
else:
return [
torch.zeros(shape, dtype=dtype)
for shape, dtype in zip(sample_empty_shapes, dtypes)
]
This is likely because of lines 190-191
sample_empty_shapes = [(0, *shape_safe(x)) for x in dataset[0]]
dtypes = [dtype_safe(x) for x in dataset[0]]
I could create a notebook to reproduce this in a few days if that might help.