Failed to generate timestamp for nvidia/parakeet-tdt-1.1b
Describe the bug
When I tried to generate timestamp with model: nvidia/parakeet-tdt-1.1b, I got following error,
ValueError: char_offsets: [{'char': [tensor(607, dtype=torch.int32)], 'start_offset': 28, 'end_offset': 29}....
call stack,
Traceback (most recent call last):
File "/tmp/inference/nvidia_asr.py", line 103, in <module>
main()
File "/tmp/inference/nvidia_asr.py", line 94, in main
tt = parakeet_rnnt( audio, 'tdt' )
File "/tmp/inference/nvidia_asr.py", line 45, in parakeet_rnnt
hypothesis = asr_model.transcribe([audio], return_hypotheses=True)[0][0]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/models/rnnt_models.py", line 298, in transcribe
best_hyp, all_hyp = self.decoding.rnnt_decoder_predictions_tensor(
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/metrics/rnnt_wer.py", line 497, in rnnt_decoder_predictions_tensor
hypotheses[hyp_idx] = self.compute_rnnt_timestamps(hypotheses[hyp_idx], timestamp_type)
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/metrics/rnnt_wer.py", line 699, in compute_rnnt_timestamps
raise ValueError(
Steps/Code to reproduce bug The code to reproduce above the bug, (The code below can be used to get timestamp if use parakeet rnnt-1.1b model )
asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-1.1b")
decoding_cfg = asr_model.cfg.decoding
with open_dict(decoding_cfg):
decoding_cfg.preserve_alignments = True
decoding_cfg.compute_timestamps = True
decoding_cfg.rnnt_timestamp_type = 'word'
asr_model.change_decoding_strategy(decoding_cfg)
hypothesis = asr_model.transcribe([audio], return_hypotheses=True)[0][0]
timestamp_dict = hypothesis.timestep
word_timestamps = timestamp_dict['word']
print(word_timestamps)
Expected behavior It should output word timestamps instead of exception.
Environment overview (please complete the following information)
- Environment location: run in local ubuntu22.04 machine.
- Method of NeMo install: pip install nemo_toolkit['all']
Environment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
- OS version: ubuntu 22.04
- PyTorch version: 2.2.0+cu118
- Python version: 3.10.12
Additional context
Add any other context about the problem here. GPU model: GTX 1080T
Also seeing this error.
My temporary workaround is to catch ValueErrors and just add a second or two of blank audio to the end of the file before re-processing which seems to work as a temporary stop-gap until this can be fixed.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Adding comment to prevent this issue from closing.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
@bradmurray-dt Still facing this issue with 'parakeet-tdt-1.1b' and 'parakeet-tdt-ctc-1.1b':
File "/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/rnnt_decoding.py", line 510, in rnnt_decoder_predictions_tensor
hypotheses[hyp_idx] = self.compute_rnnt_timestamps(hypotheses[hyp_idx], timestamp_type)
File "/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/rnnt_decoding.py", line 753, in compute_rnnt_timestamps
raise ValueError(
ValueError: `char_offsets`: [{'char': [tensor(386, dtype=torch.int32)], 'start_offset': 2, 'end_offset': 3},.....
have to be of the same length, but are: `len(offsets)`: 102 and `len(processed_tokens)`: 103