Expose transcription events to the agent (multimodal)
Users need a way to get the transcript for both the agent and participant(s). Currently, the expectation (based on docs) is that the developer would listen for the transcription_received event. However, this event isn't emitted.
Could we provide a way for agents to access the transcripts from within the Python (and NodeJS) code?
This a major issue which is preventing us from upgrading to multimodal agent, quick resolution is highly appreciated....
Fixed by https://github.com/livekit/agents/pull/1001 It is going to be released on livekit-agents==0.11.0 (tmrw)
Hi, I see that agent_speech_committed event has been added, but this seems to wait for the entire response before emitting. Is there a way to get intermediate transcriptions from the agent? Thanks!
Hi, I see that
agent_speech_committedevent has been added, but this seems to wait for the entire response before emitting. Is there a way to get intermediate transcriptions from the agent? Thanks!
Bump
agent_speech_committed and user_speech_committed are not emitting for my multimodal node.js project (created from your quickstart documentation) and they are not listed as events in the documentation (unlike Python). Will these be added soon? They are fairly crucial for us and the last major component for us to start testing with users.
@creightontaylor : see https://github.com/livekit/agents-js/issues/164. We're working on it this week.
@mike-r-mclaughlin awesome! I eagerly await this release! I'm confidently we'll figure something out because you already have it for Python. I'm prepping with some functionality around it.