为啥stt还能收到很多静默音频
Feature Type
Nice to have
Feature Description
vad应该会拦截静默音频,但是我实现stt模块,还能得到静默音频,为啥呢?
Workarounds / Alternatives
No response
Additional Context
No response
Hi, do you mean you pre-process audio in stt_node? Could you share more on details on what you are trying to accomplish?
Hi, do you mean you pre-process audio in
stt_node? Could you share more on details on what you are trying to accomplish?
stt模块是在vad之后,stt模块理论上不会收到大量静默音频吧?
are you using a streaming STT or non-streaming? if streaming, all audio frames will be sent to the STT, otherwise if it's a non-stream STT, maybe you can share some audio clips STT received for better understanding the issue.
are you using a streaming STT or non-streaming? if streaming, all audio frames will be sent to the STT, otherwise if it's a non-stream STT, maybe you can share some audio clips STT received for better understanding the issue.
我用的stream STT, 那我怎么区分是静默音频,还是有人声的音频呢
if it's streaming STT, it's the responsibility of STT to detect the speaking and end of user turn, otherwise you can use the non-streaming mode
- streaming: all audio frames sent to the STT in stream mode, and STT returns transcripts when it detects any
- non-streaming: only speaking clips detected by VAD will be sent to STT
if it's streaming STT, it's the responsibility of STT to detect the speaking and end of user turn, otherwise you can use the non-streaming mode
- streaming: all audio frames sent to the STT in stream mode, and STT returns transcripts when it detects any
- non-streaming: only speaking clips detected by VAD will be sent to STT
non-streaming 怎么设置或配置呢?
if it's streaming STT, it's the responsibility of STT to detect the speaking and end of user turn, otherwise you can use the non-streaming mode
- streaming: all audio frames sent to the STT in stream mode, and STT returns transcripts when it detects any
- non-streaming: only speaking clips detected by VAD will be sent to STT
这是我的代码
if it's a custom STT, it should be defined via STTCapabilities.streaming=False, for example.