NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Eval_beamsearch_ngram_ctc throws got an unexpected keyword argument 'logprobs'

Open carlfm01 opened this issue 1 year ago • 3 comments

Unable to use KenLM rescore due to missing logprobs on transcribe.

Steps/Code to reproduce the bug

  1. Cloned the repo 7916269.
  2. Used script scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py.

It throws the following error: TypeError: EncDecCTCModel.transcribe() got an unexpected keyword argument 'logprobs'.

  1. Applied the change suggested on https://github.com/NVIDIA/NeMo/issues/8884#issuecomment-2049063944.
Logits_hyps = asr_model.transcribe(
    self.audio_file_list, batch_size=self.asr_batch_size, return_hypotheses=True
)  # type: List[nemo_asr.parts.Hypothesis]
            
Logits = [hyp.alignments for hyp in Logits_hyps]

With this change, a new error occurs:

[NeMo I 2024-08-13 19:24:30 ctc_decoding:359] Beam search requires that consecutive CTC tokens are not folded. 
    Overriding provided value of `fold_consecutive` = True to False
Segmentation fault (core dumped)
  1. I thought it was KenLM failing, so I applied this fix: https://github.com/flashlight/wav2letter/issues/875, but it did not work.

I've searched more in the docs, PR, and comments and discovered the karpnv/beamsearch branch from https://github.com/NVIDIA/NeMo/pull/8428. Using the beamsearch branch, I'm still unable to search for alpha and beta values.

The command:

python eval_beamsearch_ngram_ctc.py \
  model_path=/media/carlos/asr/Conformer-CTC-BPE.nemo \
  dataset_manifest=/media/carlos/asr/asr/supervised/test-files/test-all.json \
  preds_output_folder=preds/ \
  cache_file=null \
  ctc_decoding.beam.kenlm_path=/media/carlos/asr/conformerlm3.binary \
  ctc_decoding.beam.flashlight_cfg.lexicon_path=/media/carlos/asr/conformerlm3.binary.tmp.lexicon \
  ctc_decoding.beam.beam_size=[100,200,500] \
  ctc_decoding.beam.beam_alpha=[1,2,3,4] \
  ctc_decoding.beam.beam_beta=[1,2,3,4] \
  ctc_decoding.strategy=flashlight

Shows this error:

    raise NotImplementedError("Wrong parameter combination")
NotImplementedError: Wrong parameter combination

¿Any updated guide or help? It will be greatly appreciated. I can provide any extra details if needed.

Expected behavior

Being able to search alpha and beta using the generated KenLM. We need those values to use RIVA.

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of NeMo install: From source using ./reinstall.sh

Environment details

  • OS version: Ubuntu 22.04.3 LTS
  • PyTorch version: 2.4.0+cu121
  • Python version: 3.10.12

Additional context

  • GPU model: Nvidia A100

carlfm01 avatar Aug 16 '24 02:08 carlfm01

The fix from #8884 worked for me. See this commit.

But I am using decoding_strategy="beam" and without KenLM.

aklemen avatar Aug 17 '24 10:08 aklemen

The fix from #8884 worked for me. See this commit.

It works only without using KenLM. I need KenLM to find the optimal alpha and beta values to deploy it to RIVA.

carlfm01 avatar Aug 19 '24 05:08 carlfm01

@karpnv can you review this ?

titu1994 avatar Aug 22 '24 17:08 titu1994

¿Any updates? Still blocked and unable to deploy to RIVA.

carlfm01 avatar Sep 02 '24 07:09 carlfm01

I'm having the same issue. What worked for me was implementing the same changes mentioned by @aklemen and changing _wer in this line to wer. I'm using KenLM with default parameters.

MedAymenF avatar Sep 27 '24 16:09 MedAymenF

I'm having the same issue. What worked for me was implementing the same changes mentioned by @aklemen and changing _wer in this line to wer. I'm using KenLM with default parameters.

Thanks for the suggestion, but I’ve already tried those changes before, and I’m still facing the same issue. It works perfectly in Riva, but I can't get it to work in NeMo.

carlfm01 avatar Sep 29 '24 21:09 carlfm01

Hi, @karpnv. any update regarding this? I'm facing the same issue.

mehadi92 avatar Oct 13 '24 08:10 mehadi92

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Nov 13 '24 01:11 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Nov 20 '24 02:11 github-actions[bot]

Any update? It's impossible to test a NeMo model before deploying it on Riva using NeMo tools first. Is NeMo still under maintenance?

carlfm01 avatar Feb 09 '25 05:02 carlfm01