"FileNotFoundError: KenLM binary file not found at : None" thrown when decoding without N-gram LM
Describe the bug
I am trying to use an external LLM to rescore the results of beam search from Conformer-CTC model.
When trying to get the beam search results with the eval_beamsearch_ngram_ctc.py without passing the N-gram LM, I get the following error:
Traceback (most recent call last):
File "/content/NeMo/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py", line 415, in main
candidate_wer, candidate_cer = beam_search_eval(
File "/content/NeMo/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py", line 196, in beam_search_eval
_, beams_batch = decoding.ctc_decoder_predictions_tensor(
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_decoding.py", line 319, in ctc_decoder_predictions_tensor
hypotheses_list = self.decoding(
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py", line 166, in __call__
return self.forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nemo/core/classes/common.py", line 1098, in __call__
outputs = wrapped(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py", line 280, in forward
hypotheses = self.search_algorithm(prediction_tensor, out_len)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py", line 314, in default_beam_search
raise FileNotFoundError(
FileNotFoundError: KenLM binary file not found at : None. Please set a valid path in the decoding config.
Steps/Code to reproduce bug
- Install decoders.
NEMO_PATH=<insert absolute path to NeMo directory>
cd $NEMO_PATH && bash scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh $NEMO_PATH
- Run the beam search with the following config:
python3 $NEMO_PATH/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py \
nemo_model_file="<nemo CTC ASR model, e.g. stt_en_conformer_ctc_medium.nemo>" \
input_manifest="<manifest json file>" \
preds_output_folder="<output directory>" \
decoding_mode=beamsearch \
decoding_strategy="beam"
Expected behavior
I would expect the error to not be thrown as BeamSearchDecoderWithLM actually handles the case when the path to N-gram LM is not passed:
# from nemo/collections/asr/modules/beam_search_decoder.py
if lm_path is not None:
self.scorer = Scorer(alpha, beta, model_path=lm_path, vocabulary=vocab)
else:
self.scorer = None
When I removed the check for the KenLM file path from nemo/collections/asr/parts/submodules/ctc_beam_decoding.py, it worked:
# Check for filepath
if self.kenlm_path is None or not os.path.exists(self.kenlm_path):
raise FileNotFoundError(
f"KenLM binary file not found at : {self.kenlm_path}. "
f"Please set a valid path in the decoding config."
)
Environment overview
- Environment location: Google Colab
- Method of NeMo install:
python -m pip install git+https://github.com/NVIDIA/[email protected]#egg=nemo_toolkit[all]
Environment details
- OS version: Ubuntu 22.04.4 LTS
- PyTorch version: 2.2.1+cu121
- Python version: 3.10
Additional context
GPU: T4
Update: We observed couple of code changes required with this script due to recent updates during the model and transcription refactoring. @karpov-nick is working to provide a fix for this.
There is a work in progress in the PR https://github.com/NVIDIA/NeMo/pull/8428
Thank you both!
You can try decoding without N-gram at the branch karpnv/beamsearch with parameters
python3 ./scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py \
model_path=./am_model.nemo \
dataset_manifest=./manifest.json \
preds_output_folder=/tmp \
ctc_decoding.strategy=flashlight \
ctc_decoding.beam.nemo_kenlm_path="" \
ctc_decoding.beam.beam_size=[4] \
ctc_decoding.beam.beam_beta=[0.5]
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.