FunASR funasr + Whisper语音识别-多语言-large-v3 + fsmn-vad + ct-punc-c + cam++ 报错 raise NotImplementedError("batch decoding is not implemented") NotImplementedError: batch decoding is not implemented

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

❓ Questions and Help

使用funasr调用Whisper语音识别-多语言-large-v3这个模型，并使用了fsmn-vad ； ct-punc-c ；cam++，报错错误如下： funasr version: 1.1.16. Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel You are using the latest version of funasr-1.1.16 Detect model requirements, begin to install it: /data/lproot/dl/speechRec/models/whisper-more-language/requirements.txt install model requirements successfully Detect model requirements, begin to install it: /data/lproot/dl/speechRec/models/spk/requirements.txt install model requirements successfully rtf_avg: 0.038: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.15s/it] 0%| | 0/1 [00:00<?, ?it/sTraceback (most recent call last): | 0/8 [00:00<?, ?it/s] File "/data/lproot/dl/speechRec/core/paraformer/main.py", line 42, in res = model.generate(input=path, File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 304, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 458, in inference_with_vad results = self.inference( File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 343, in inference res = model.inference(**batch, **kwargs) File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/models/whisper/model.py", line 66, in inference raise NotImplementedError("batch decoding is not implemented") NotImplementedError: batch decoding is not implemented 0%| | 0/8 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s]

Before asking:

What is your question?

Code

from funasr import AutoModel

# paraformer_zh_path = "/models/Paraformer"
paraformer_zh_path = "models/whisper-more-language"
vad_model_path = "/models/Fsmn"
punc_model_path = "/models/punc_ct"
spk_model_path = "/models/spk"

# 离线加载模型
model = AutoModel(
    model=paraformer_zh_path,
    vad_model=vad_model_path,
    punc_model=punc_model_path,
    spk_model=spk_model_path,
    #device="cuda:0",
)

path = "新录音1.m4a"

DecodingOptions = {
    "task": "transcribe",
    "language": None,
    "beam_size": None,
    "fp16": True,
    "without_timestamps": False,
    "prompt": None,
}

res = model.generate(input=path, 
            DecodingOptions=DecodingOptions,
            batch_size_s=300, 
            hotword='C16',)
print(res)

What have you tried?

What's your environment?

OS (e.g., Linux):centos7.9
FunASR Version (e.g., 1.0.0):1.1.16
ModelScope Version (e.g., 1.11.0):1.20.1
PyTorch Version (e.g., 2.0.0): pytorch-lightning 2.4.0 pytorch-metric-learning 2.6.0 pytorch-wpe 0.0.1 torch 2.3.1 torch-audiomentations 0.11.1 torch-complex 0.4.4 torch-pitch-shift 1.2.4 torchaudio 2.3.1 torchmetrics 1.4.1
How you installed funasr (pip, source):pip
Python version:3.10
GPU (e.g., V100M32):A40
CUDA/cuDNN version (e.g., cuda11.7):11.8，8.6.0
Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
Any other relevant information:

Dec 05 '24 13:12 guiniao

我也遇到了，请问你现在解决了吗

Jan 16 '25 03:01 yyl494

我跟你遇到一样的问题，害 FunASR 官方文档写是支持 Whisper-large-v3 的

Mar 17 '25 08:03 weii918

一样的错误，有参数可以配置取消批量模式？batch mode

Mar 28 '25 01:03 whmzsu

我用cursor改了一版能跑了，下面是cursor总结的

1. 为什么会有这个问题

这个问题的根本原因在于FunASR库中Whisper模型实现的局限性：

批处理未实现：在FunASR库的WhisperWarp类中，inference方法明确检查batch_size参数，当它大于1时会抛出错误"batch decoding is not implemented"。这是因为Whisper模型的实现没有支持批处理功能。
VAD分段处理：当处理较长音频时，FunASR会先用VAD（语音活动检测）模型将音频切分成多个片段，然后批量送入ASR模型处理。对于其他模型（如SenseVoice和Paraformer）这种方式效率很高，但Whisper模型不支持批处理。
参数配置不合理：默认的batch_size_s参数为60秒，这会导致批处理大小非常大（内部转换为毫秒后约60000），而Whisper模型更适合处理较短的音频段。

2. 为此做的改动

为了解决这个问题，我实现了以下优化：

创建Whisper补丁：
- 通过猴子补丁（Monkey Patch）技术修改了WhisperWarp类的inference方法
- 当batch_size > 1时，不抛出错误，而是改为逐个处理每个样本，然后合并结果
限制批处理大小：
- 在补丁中添加了逻辑，将过大的batch_size值（>100）自动降低到合理的值（5）
- 在ASR引擎中为Whisper模型设置了较小的batch_size_s值（5秒而非默认的60秒）
优化VAD分段：
- 为Whisper模型修改了VAD的最大分段长度，从默认的30000毫秒（30秒）减少到5000毫秒（5秒）
- 这使得Whisper模型可以处理更短的音频段，提高处理效率
补丁AutoModel：
- 修改了AutoModel的generate方法，在检测到使用Whisper模型时自动调整批处理参数
- 这确保了在整个处理流程中，Whisper模型都使用最优的参数设置
参数传递优化：
- 在run_model方法中增加了对vad_max_segment_length参数的支持
- 改进了参数传递逻辑，使配置更加灵活

May 06 '25 09:05 Ignareo

You can fix this error by setting batch=1, like

 MODEL_ROOT_DIR = ""
  asr_model = AutoModel(
      model=MODEL_ROOT_DIR+"Whisper-large-v3",
      vad_model=MODEL_ROOT_DIR+"speech_fsmn_vad_zh-cn-16k-common-pytorch",
      vad_kwargs={"max_single_segment_time": 30000},
      punc_model=MODEL_ROOT_DIR+"punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
      device='cuda',
      disable_update=True
      )


  res = asr_model.generate(
      input=output_wav,
      cache={},
      language="en",
      use_itn=True,
      batch_size=1,  # HERE
      batch_size_s=300,
      batch_size_threshold_s=60,
      merge_vad=True,
      merge_length_s=35,
  )
  text = rich_transcription_postprocess(res[0]["text"])
  print(text)

Since VAD will crop the video in batches, but the ASR model cannot be implemented in batches. （I guess?）

Aug 12 '25 13:08 clairetsai1222

把batch_size_s参数改为0或None

Sep 03 '25 06:09 Json0926