Valid step generates a RuntimeError

Open Craya opened this issue 1 year ago • 4 comments

Dear Team,

I want to compare the ASR results we have reached based on wav2vec2 & whisper architectures, with your SummaryMixing one.

We are performing a custom ASR training, our dataset is composed of 95 000 records for Train, 16 000 records for Val, 17 000 records for Test.

Train was successfully performed with the following parameters (A100 40G GPU):

precision=bf16
batch_size=12
number_of_epochs=60
token_type=bpe

However, at epoch 1 valid step, we got the following error:

speechbrain.utils.epoch_loop - Going into epoch 1
  0%|          | 0/8660 [00:00<?, ?it/s]/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:5109: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead.
  warnings.warn(
 44%|████▍     | 3803/8660 [15:03<18:27,  4.39it/s, train_loss=93.3]speechbrain.utils.checkpoints - Saving an empty state_dict for <torch.cuda.amp.grad_scaler.GradScaler object at 0x7fe0401657c0> in /data/outputs/save/CKPT+2024-05-31+14-19-54+00/scaler.ckpt.
 88%|████████▊ | 7586/8660 [30:12<04:18,  4.15it/s, train_loss=80.9]speechbrain.utils.checkpoints - Saving an empty state_dict for <torch.cuda.amp.grad_scaler.GradScaler object at 0x7fe0401657c0> in /data/outputs/save/CKPT+2024-05-31+14-35-03+00/scaler.ckpt.
100%|██████████| 8660/8660 [34:15<00:00,  4.21it/s, train_loss=78.7]
  0%|          | 0/1291 [00:00<?, ?it/s]
speechbrain.core - Exception:
Traceback (most recent call last):
  File "train.py", line 442, in <module>
    asr_brain.fit(
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/core.py", line 1556, in fit
    self._fit_valid(valid_set=valid_set, epoch=epoch, enable=enable)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/core.py", line 1462, in _fit_valid
    loss = self.evaluate_batch(batch, stage=Stage.VALID)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/core.py", line 1345, in evaluate_batch
    out = self.compute_forward(batch, stage=stage)
  File "train.py", line 68, in compute_forward
    enc_out, pred = self.modules.Transformer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/data_parallel.py", line 183, in forward
    return self.module(*inputs[0], **module_kwargs[0])
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/lobes/models/transformer/TransformerASR.py", line 381, in forward
    encoder_out, _ = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/lobes/models/transformer/Branchformer.py", line 480, in forward
    output, attention = enc_layer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/lobes/models/transformer/Branchformer.py", line 277, in forward
    x2 = self._forward_cnn_branch(x2)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/lobes/models/transformer/Branchformer.py", line 293, in _forward_cnn_branch
    x = self.convolution_branch(x)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/lobes/models/transformer/Branchformer.py", line 94, in forward
    x = self.csgu(x)  # (B, T, D//2)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/lobes/models/convolution.py", line 99, in forward
    x2 = self.conv(x2)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/nnet/CNN.py", line 428, in forward
    x = self._manage_padding(
  File "/usr/local/lib/python3.8/dist-packages/speechbrain/nnet/CNN.py", line 480, in _manage_padding
    x = F.pad(x, padding, mode=self.padding_mode)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 4495, in pad
    return torch._C._nn.pad(input, pad, mode, value)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (15, 15) at dimension 2 of input [12, 1536, 13]

What's wrong?

Thanks for your support.

May 31 '24 14:05 Craya