Contrastive Loss: Runtime error for certain audio files when reshaping out_masked_only
Describe the bug
After a few steps when pretraining a SpeechEncDecSelfSupervisedModel, training fails with the following error
File "/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/models/ssl_models.py", line 468, in training_step
loss_value, loss_val_dict = self.decoder_loss_step(
File "/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/models/ssl_models.py", line 450, in decoder_loss_step
current_loss_value = current_loss(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1129, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/nemo/core/classes/common.py", line 963, in __call__
outputs = wrapped(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/losses/ssl_losses/contrastive.py", line 187, in forward
out_masked_only = out_masked_only.reshape(bs, -1, out_masked_only.shape[-1])
RuntimeError: shape '[16, -1, 128]' is invalid for input of size 547712
Steps/Code to reproduce bug
I used the speech_pre_training.py script with the default configuration for the conformer here.
I only modified train_ds.max_duration: 25.0.
I also tried the config in the Self_Supervised_Pre_Training.ipynb notebook and I got the same error.
Expected behavior
Training should process normally and/or a more detailed explanation why the reshape failed and what parameters to change.
Environment overview (please complete the following information)
- Environment location: Docker (
nvcr.io/nvidia/pytorch:22.05-py3) - Method of NeMo install:
pip install 'nemo_toolkit[all]==1.10.0' - If method of install is [Docker], provide
docker pull&docker runcommands used
docker run --rm -it --gpus all --ipc=host --env-file .env train
Btw, current workaround is to set loss_list.contrastive.loss.sample_from_same_utterance_only=False of course, but ideally this works with it set to true.
This issue is stale because it has been open for 60 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.