NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Failed to generate timestamp for nvidia/parakeet-tdt-1.1b

Open leohuang2013 opened this issue 1 year ago • 4 comments

Describe the bug

When I tried to generate timestamp with model: nvidia/parakeet-tdt-1.1b, I got following error, ValueError: char_offsets: [{'char': [tensor(607, dtype=torch.int32)], 'start_offset': 28, 'end_offset': 29}....

call stack,

Traceback (most recent call last):
  File "/tmp/inference/nvidia_asr.py", line 103, in <module>
    main()
  File "/tmp/inference/nvidia_asr.py", line 94, in main
    tt = parakeet_rnnt( audio, 'tdt' )
  File "/tmp/inference/nvidia_asr.py", line 45, in parakeet_rnnt
    hypothesis = asr_model.transcribe([audio], return_hypotheses=True)[0][0]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/models/rnnt_models.py", line 298, in transcribe
    best_hyp, all_hyp = self.decoding.rnnt_decoder_predictions_tensor(
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/metrics/rnnt_wer.py", line 497, in rnnt_decoder_predictions_tensor
    hypotheses[hyp_idx] = self.compute_rnnt_timestamps(hypotheses[hyp_idx], timestamp_type)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/metrics/rnnt_wer.py", line 699, in compute_rnnt_timestamps
    raise ValueError(

Steps/Code to reproduce bug The code to reproduce above the bug, (The code below can be used to get timestamp if use parakeet rnnt-1.1b model )

asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-1.1b")
decoding_cfg = asr_model.cfg.decoding
with open_dict(decoding_cfg):
    decoding_cfg.preserve_alignments = True
    decoding_cfg.compute_timestamps = True
    decoding_cfg.rnnt_timestamp_type = 'word'
asr_model.change_decoding_strategy(decoding_cfg)
hypothesis = asr_model.transcribe([audio], return_hypotheses=True)[0][0]
timestamp_dict = hypothesis.timestep
word_timestamps = timestamp_dict['word']
print(word_timestamps)

Expected behavior It should output word timestamps instead of exception.

Environment overview (please complete the following information)

  • Environment location: run in local ubuntu22.04 machine.
  • Method of NeMo install: pip install nemo_toolkit['all']

Environment details

If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:

  • OS version: ubuntu 22.04
  • PyTorch version: 2.2.0+cu118
  • Python version: 3.10.12

Additional context

Add any other context about the problem here. GPU model: GTX 1080T

leohuang2013 avatar Feb 17 '24 10:02 leohuang2013

Also seeing this error.

My temporary workaround is to catch ValueErrors and just add a second or two of blank audio to the end of the file before re-processing which seems to work as a temporary stop-gap until this can be fixed.

isaac-mcfadyen avatar Mar 05 '24 16:03 isaac-mcfadyen

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Apr 05 '24 01:04 github-actions[bot]

Adding comment to prevent this issue from closing.

bradmurray-dt avatar Apr 09 '24 15:04 bradmurray-dt

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 10 '24 01:05 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar May 18 '24 01:05 github-actions[bot]

@bradmurray-dt Still facing this issue with 'parakeet-tdt-1.1b' and 'parakeet-tdt-ctc-1.1b':

  File "/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/rnnt_decoding.py", line 510, in rnnt_decoder_predictions_tensor
    hypotheses[hyp_idx] = self.compute_rnnt_timestamps(hypotheses[hyp_idx], timestamp_type)
  File "/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/rnnt_decoding.py", line 753, in compute_rnnt_timestamps
    raise ValueError(
ValueError: `char_offsets`: [{'char': [tensor(386, dtype=torch.int32)], 'start_offset': 2, 'end_offset': 3},.....
have to be of the same length, but are: `len(offsets)`: 102 and `len(processed_tokens)`: 103

anshulwadhawan avatar Jun 19 '24 01:06 anshulwadhawan