agents 为啥stt还能收到很多静默音频

Feature Type

Nice to have

Feature Description

vad应该会拦截静默音频，但是我实现stt模块，还能得到静默音频，为啥呢？

Workarounds / Alternatives

No response

Additional Context

No response

Dec 11 '25 09:12 zhaojiangbing

Hi, do you mean you pre-process audio in stt_node? Could you share more on details on what you are trying to accomplish?

Dec 11 '25 20:12 tinalenguyen

Hi, do you mean you pre-process audio in stt_node? Could you share more on details on what you are trying to accomplish?

stt模块是在vad之后，stt模块理论上不会收到大量静默音频吧？

Dec 12 '25 01:12 zhaojiangbing

are you using a streaming STT or non-streaming? if streaming, all audio frames will be sent to the STT, otherwise if it's a non-stream STT, maybe you can share some audio clips STT received for better understanding the issue.

Dec 12 '25 03:12 longcw

are you using a streaming STT or non-streaming? if streaming, all audio frames will be sent to the STT, otherwise if it's a non-stream STT, maybe you can share some audio clips STT received for better understanding the issue.

我用的stream STT, 那我怎么区分是静默音频，还是有人声的音频呢

Dec 12 '25 03:12 zhaojiangbing

if it's streaming STT, it's the responsibility of STT to detect the speaking and end of user turn, otherwise you can use the non-streaming mode

streaming: all audio frames sent to the STT in stream mode, and STT returns transcripts when it detects any
non-streaming: only speaking clips detected by VAD will be sent to STT

Dec 12 '25 03:12 longcw

if it's streaming STT, it's the responsibility of STT to detect the speaking and end of user turn, otherwise you can use the non-streaming mode

streaming: all audio frames sent to the STT in stream mode, and STT returns transcripts when it detects any

non-streaming: only speaking clips detected by VAD will be sent to STT

non-streaming 怎么设置或配置呢？

Dec 12 '25 04:12 zhaojiangbing

if it's streaming STT, it's the responsibility of STT to detect the speaking and end of user turn, otherwise you can use the non-streaming mode

streaming: all audio frames sent to the STT in stream mode, and STT returns transcripts when it detects any

non-streaming: only speaking clips detected by VAD will be sent to STT

这是我的代码

Dec 12 '25 04:12 zhaojiangbing

if it's a custom STT, it should be defined via STTCapabilities.streaming=False, for example.

Dec 15 '25 03:12 longcw