Websocket 服务中针对不同音频识别结果差异巨大
Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
Why do the recognition results vary significantly when using the code provided in the repository for different audio samples?
Code
code path: FunASR/websocket/funasr_client_api.py wav_ path1: FunASR-main\runtime\funasr_api\asr_example.wav" outputs: connect to url ws://127.0.0.1:10095 send json {"mode": "2pass", "chunk_size": [0, 10, 5], "encoder_chunk_look_back": 4, "decoder_chunk_look_back": 1, "chunk_interval": 10, "wav_name": "default", "is_speaking": true} text {'mode': '2pass-online', 'text': '欢迎大', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '家来', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '体验达', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '摩院推', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '出的语', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '音识', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '别模型', 'wav_name': 'default', 'is_final': True}
wav_ path1: FunASR-main\runtime\funasr_api\SSB00050002.wav"
connect to url ws://127.0.0.1:10095 send json {"mode": "2pass", "chunk_size": [0, 10, 5], "encoder_chunk_look_back": 4, "decoder_chunk_look_back": 1, "chunk_interval": 10, "wav_name": "default", "is_speaking": true} text {'mode': '2pass-online', 'text': '群', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '你', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-offline', 'text': '春日,天寒。', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '嗯', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-offline', 'text': '电话。', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '嗯', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '嗯', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-offline', 'text': '我的勇器。', 'wav_name': 'default', 'is_final': True}
What have you tried?
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.0):
- ModelScope Version (e.g., 1.11.0):
- PyTorch Version (e.g., 2.0.0):
- How you installed funasr (
pip, source): - Python version:
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
The result for the SSB dataset is unacceptable. I do not change any configuration. Does anyone know the reason? THKS
请问您这个流式返回是怎么实现的,求解。
@Akmend Use the code examples showed on thsi project. Keep in mind that the input wav file must be 16K sampling rate
` import os import soundfile from funasr import AutoModel
model_path = r"data/model_hub/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online" wav_path = r'/home/workspace/lwm/AwesomeCode/FunASR/SSB00050001.wav'
chunk_size = [0, 10, 5] # [0, 10, 5] 600ms, [0, 8, 4] 480ms encoder_chunk_look_back = 4 # number of chunks to lookback for encoder self-attention decoder_chunk_look_back = 1 # number of encoder chunks to lookback for decoder cross-attention
model = AutoModel(model=model_path)
wav_file = os.path.join(model.model_path, wav_path) speech, sample_rate = soundfile.read(wav_file) chunk_stride = chunk_size[1] * 960 # 600ms
cache = {} total_chunk_num = int(len((speech) - 1) / chunk_stride + 1) for i in range(total_chunk_num): speech_chunk = speech[i * chunk_stride:(i + 1) * chunk_stride] is_final = i == total_chunk_num - 1 res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back) print(res) `
Show me the srv code.