agents Expose transcription events to the agent (multimodal)

Users need a way to get the transcript for both the agent and participant(s). Currently, the expectation (based on docs) is that the developer would listen for the transcription_received event. However, this event isn't emitted.

Could we provide a way for agents to access the transcripts from within the Python (and NodeJS) code?

Oct 09 '24 20:10 mike-r-mclaughlin

This a major issue which is preventing us from upgrading to multimodal agent, quick resolution is highly appreciated....

Oct 10 '24 18:10 gauravkesharwani

Fixed by https://github.com/livekit/agents/pull/1001 It is going to be released on livekit-agents==0.11.0 (tmrw)

Oct 30 '24 05:10 theomonnom

Hi, I see that agent_speech_committed event has been added, but this seems to wait for the entire response before emitting. Is there a way to get intermediate transcriptions from the agent? Thanks!

Nov 12 '24 17:11 kylekaplan

Hi, I see that agent_speech_committed event has been added, but this seems to wait for the entire response before emitting. Is there a way to get intermediate transcriptions from the agent? Thanks!

Bump

Nov 14 '24 21:11 dsgolman

agent_speech_committed and user_speech_committed are not emitting for my multimodal node.js project (created from your quickstart documentation) and they are not listed as events in the documentation (unlike Python). Will these be added soon? They are fairly crucial for us and the last major component for us to start testing with users.

Nov 17 '24 05:11 creightontaylor

@creightontaylor : see https://github.com/livekit/agents-js/issues/164. We're working on it this week.

Nov 18 '24 20:11 mike-r-mclaughlin

@mike-r-mclaughlin awesome! I eagerly await this release! I'm confidently we'll figure something out because you already have it for Python. I'm prepping with some functionality around it.

Nov 19 '24 03:11 creightontaylor