FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

funasr + Whisper语音识别-多语言-large-v3 + fsmn-vad + ct-punc-c + cam++ 报错 raise NotImplementedError("batch decoding is not implemented") NotImplementedError: batch decoding is not implemented

Open guiniao opened this issue 1 year ago • 6 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

使用funasr调用Whisper语音识别-多语言-large-v3这个模型,并使用了fsmn-vad ; ct-punc-c ;cam++,报错 错误如下: funasr version: 1.1.16. Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel You are using the latest version of funasr-1.1.16 Detect model requirements, begin to install it: /data/lproot/dl/speechRec/models/whisper-more-language/requirements.txt install model requirements successfully Detect model requirements, begin to install it: /data/lproot/dl/speechRec/models/spk/requirements.txt install model requirements successfully rtf_avg: 0.038: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.15s/it] 0%| | 0/1 [00:00<?, ?it/sTraceback (most recent call last): | 0/8 [00:00<?, ?it/s] File "/data/lproot/dl/speechRec/core/paraformer/main.py", line 42, in res = model.generate(input=path, File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 304, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 458, in inference_with_vad results = self.inference( File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 343, in inference res = model.inference(**batch, **kwargs) File "/home/lproot/.conda/envs/whisper/lib/python3.10/site-packages/funasr/models/whisper/model.py", line 66, in inference raise NotImplementedError("batch decoding is not implemented") NotImplementedError: batch decoding is not implemented 0%| | 0/8 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s]

Before asking:

What is your question?

Code

from funasr import AutoModel

# paraformer_zh_path = "/models/Paraformer"
paraformer_zh_path = "models/whisper-more-language"
vad_model_path = "/models/Fsmn"
punc_model_path = "/models/punc_ct"
spk_model_path = "/models/spk"

# 离线加载模型
model = AutoModel(
    model=paraformer_zh_path,
    vad_model=vad_model_path,
    punc_model=punc_model_path,
    spk_model=spk_model_path,
    #device="cuda:0",
)

path = "新录音1.m4a"

DecodingOptions = {
    "task": "transcribe",
    "language": None,
    "beam_size": None,
    "fp16": True,
    "without_timestamps": False,
    "prompt": None,
}

res = model.generate(input=path, 
            DecodingOptions=DecodingOptions,
            batch_size_s=300, 
            hotword='C16',)
print(res)

What have you tried?

What's your environment?

  • OS (e.g., Linux):centos7.9
  • FunASR Version (e.g., 1.0.0):1.1.16
  • ModelScope Version (e.g., 1.11.0):1.20.1
  • PyTorch Version (e.g., 2.0.0): pytorch-lightning 2.4.0 pytorch-metric-learning 2.6.0 pytorch-wpe 0.0.1 torch 2.3.1 torch-audiomentations 0.11.1 torch-complex 0.4.4 torch-pitch-shift 1.2.4 torchaudio 2.3.1 torchmetrics 1.4.1
  • How you installed funasr (pip, source):pip
  • Python version:3.10
  • GPU (e.g., V100M32):A40
  • CUDA/cuDNN version (e.g., cuda11.7):11.8,8.6.0
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

guiniao avatar Dec 05 '24 13:12 guiniao

我也遇到了,请问你现在解决了吗

yyl494 avatar Jan 16 '25 03:01 yyl494

我跟你遇到一样的问题,害 FunASR 官方文档写是支持 Whisper-large-v3 的

weii918 avatar Mar 17 '25 08:03 weii918

一样的错误,有参数可以配置取消批量模式?batch mode

whmzsu avatar Mar 28 '25 01:03 whmzsu

我用cursor改了一版能跑了,下面是cursor总结的

1. 为什么会有这个问题

这个问题的根本原因在于FunASR库中Whisper模型实现的局限性:

  1. 批处理未实现:在FunASR库的WhisperWarp类中,inference方法明确检查batch_size参数,当它大于1时会抛出错误"batch decoding is not implemented"。这是因为Whisper模型的实现没有支持批处理功能。

  2. VAD分段处理:当处理较长音频时,FunASR会先用VAD(语音活动检测)模型将音频切分成多个片段,然后批量送入ASR模型处理。对于其他模型(如SenseVoice和Paraformer)这种方式效率很高,但Whisper模型不支持批处理。

  3. 参数配置不合理:默认的batch_size_s参数为60秒,这会导致批处理大小非常大(内部转换为毫秒后约60000),而Whisper模型更适合处理较短的音频段。

2. 为此做的改动

为了解决这个问题,我实现了以下优化:

  1. 创建Whisper补丁

    • 通过猴子补丁(Monkey Patch)技术修改了WhisperWarp类的inference方法
    • batch_size > 1时,不抛出错误,而是改为逐个处理每个样本,然后合并结果
  2. 限制批处理大小

    • 在补丁中添加了逻辑,将过大的batch_size值(>100)自动降低到合理的值(5)
    • 在ASR引擎中为Whisper模型设置了较小的batch_size_s值(5秒而非默认的60秒)
  3. 优化VAD分段

    • 为Whisper模型修改了VAD的最大分段长度,从默认的30000毫秒(30秒)减少到5000毫秒(5秒)
    • 这使得Whisper模型可以处理更短的音频段,提高处理效率
  4. 补丁AutoModel

    • 修改了AutoModel的generate方法,在检测到使用Whisper模型时自动调整批处理参数
    • 这确保了在整个处理流程中,Whisper模型都使用最优的参数设置
  5. 参数传递优化

    • run_model方法中增加了对vad_max_segment_length参数的支持
    • 改进了参数传递逻辑,使配置更加灵活

Ignareo avatar May 06 '25 09:05 Ignareo

You can fix this error by setting batch=1, like

 MODEL_ROOT_DIR = ""
  asr_model = AutoModel(
      model=MODEL_ROOT_DIR+"Whisper-large-v3",
      vad_model=MODEL_ROOT_DIR+"speech_fsmn_vad_zh-cn-16k-common-pytorch",
      vad_kwargs={"max_single_segment_time": 30000},
      punc_model=MODEL_ROOT_DIR+"punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
      device='cuda',
      disable_update=True
      )


  res = asr_model.generate(
      input=output_wav,
      cache={},
      language="en",
      use_itn=True,
      batch_size=1,  # HERE
      batch_size_s=300,
      batch_size_threshold_s=60,
      merge_vad=True,
      merge_length_s=35,
  )
  text = rich_transcription_postprocess(res[0]["text"])
  print(text)

Since VAD will crop the video in batches, but the ASR model cannot be implemented in batches. (I guess?)

clairetsai1222 avatar Aug 12 '25 13:08 clairetsai1222

把batch_size_s参数改为0或None

Json0926 avatar Sep 03 '25 06:09 Json0926