Feature Request: Diarization Support for Plugins
First, thanks for creating such an excellent tool!
I was wondering if there are any plans to add support for speaker diarization when using plugins (e.g., Google, Deepgram, Azure). This would be extremely helpful for distinguishing between multiple speakers during real-time audio processing.
Looking forward to any updates on this feature. Thanks again!
I really need it in deepgram. It's definitely support by them https://developers.deepgram.com/docs/diarization
Hey i think it should also support camera based diarization so you know not only what is being said by different people but also who
Is it possible to add start_time, end_time , words(if supported by providers) when the agent's and user's speech is commited? I had made this change for the voice pipeline agent but It would be great to have some feature for this upstream for that in Voice agent. Our platform displays transcripts alongside recordings, allowing users to click on a specific message to jump to that exact moment in the recording. Timestamps in words enable the creation of clips from any selected point in the transcript. Since these transcripts are used for usability research, we require timestamps at multiple points to serve as reference markers that support key research insights.
Example format required
`[
"role": "assistant",
"content": "Hey, how can I help you today?",
"start": 1742134707.016703,
"end": 1742134708.875511,
"words": [
{
"text": "Hey,",
"start": 1742134707.016781,
"end": 1742134707.279758
},
{
"text": "how",
"start": 1742134707.279854,
"end": 1742134707.543107
},
{
"text": "can",
"start": 1742134707.543137,
"end": 1742134707.806495
},
{
"text": "I",
"start": 1742134707.806515,
"end": 1742134708.069907
},
{
"text": "help",
"start": 1742134708.0699232,
"end": 1742134708.332927
},
{
"text": "you",
"start": 1742134708.332947,
"end": 1742134708.5982761
},
{
"text": "today?",
"start": 1742134708.598859,
"end": 1742134708.875474
}
]
} ]`