FunASR
FunASR copied to clipboard
识别pcm扩展名(16k 1c 16bit)音频,报"Failed to load audio"
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
🐛 Bug
识别PCM音频,报错"Failed to load audio"
To Reproduce
识别pcm扩展名(16k 1c 16bit)音频,报"Failed to load audio"
Code sample
import os
import requests
import json
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model_dir = "iic/SenseVoiceSmall"
model = AutoModel(
model=model_dir,
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cuda:0",
)
res = model.generate(
input="test.pcm",
cache={},
language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
use_itn=True,
batch_size_s=60,
merge_vad=True, #
merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
Expected behavior
按照16k 1c 16bit来加载pcm扩展名的音频并正常识别
Environment
- OS: Ubuntu 22.04.3 LTS
- Python: Python 3.10.12
- FunASR: 1.2.7
- ModelScope: 1.30.0
- PyTorch: 2.8.0
- CUDA: True, Version: 12.8
- Quadro RTX 4000