FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

Websocket 服务中针对不同音频识别结果差异巨大

Open WeiminLee opened this issue 1 year ago • 4 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Why do the recognition results vary significantly when using the code provided in the repository for different audio samples?

Code

code path: FunASR/websocket/funasr_client_api.py wav_ path1: FunASR-main\runtime\funasr_api\asr_example.wav" outputs: connect to url ws://127.0.0.1:10095 send json {"mode": "2pass", "chunk_size": [0, 10, 5], "encoder_chunk_look_back": 4, "decoder_chunk_look_back": 1, "chunk_interval": 10, "wav_name": "default", "is_speaking": true} text {'mode': '2pass-online', 'text': '欢迎大', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '家来', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '体验达', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '摩院推', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '出的语', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '音识', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '别模型', 'wav_name': 'default', 'is_final': True}

wav_ path1: FunASR-main\runtime\funasr_api\SSB00050002.wav"

connect to url ws://127.0.0.1:10095 send json {"mode": "2pass", "chunk_size": [0, 10, 5], "encoder_chunk_look_back": 4, "decoder_chunk_look_back": 1, "chunk_interval": 10, "wav_name": "default", "is_speaking": true} text {'mode': '2pass-online', 'text': '群', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '你', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-offline', 'text': '春日,天寒。', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '嗯', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-offline', 'text': '电话。', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '嗯', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-online', 'text': '嗯', 'wav_name': 'default', 'is_final': True} text {'mode': '2pass-offline', 'text': '我的勇器。', 'wav_name': 'default', 'is_final': True}

What have you tried?

What's your environment?

  • OS (e.g., Linux):
  • FunASR Version (e.g., 1.0.0):
  • ModelScope Version (e.g., 1.11.0):
  • PyTorch Version (e.g., 2.0.0):
  • How you installed funasr (pip, source):
  • Python version:
  • GPU (e.g., V100M32)
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

WeiminLee avatar Jul 15 '24 03:07 WeiminLee

The result for the SSB dataset is unacceptable. I do not change any configuration. Does anyone know the reason? THKS

WeiminLee avatar Jul 15 '24 03:07 WeiminLee

请问您这个流式返回是怎么实现的,求解。

Akmend avatar Jul 19 '24 01:07 Akmend

@Akmend Use the code examples showed on thsi project. Keep in mind that the input wav file must be 16K sampling rate

` import os import soundfile from funasr import AutoModel

model_path = r"data/model_hub/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online" wav_path = r'/home/workspace/lwm/AwesomeCode/FunASR/SSB00050001.wav'

chunk_size = [0, 10, 5] # [0, 10, 5] 600ms, [0, 8, 4] 480ms encoder_chunk_look_back = 4 # number of chunks to lookback for encoder self-attention decoder_chunk_look_back = 1 # number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model=model_path)

wav_file = os.path.join(model.model_path, wav_path) speech, sample_rate = soundfile.read(wav_file) chunk_stride = chunk_size[1] * 960 # 600ms

cache = {} total_chunk_num = int(len((speech) - 1) / chunk_stride + 1) for i in range(total_chunk_num): speech_chunk = speech[i * chunk_stride:(i + 1) * chunk_stride] is_final = i == total_chunk_num - 1 res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back) print(res) `

WeiminLee avatar Jul 20 '24 01:07 WeiminLee

Show me the srv code.

LauraGPT avatar Jul 22 '24 06:07 LauraGPT