Implement Manual VAD Commit via Button for Controlled Speech Processing
I've implemented a button in the client that is supposed to ensure VAD (Voice Activity Detection) doesn't immediately commit my conversation and send it to the server. Instead, it should wait until I click the button again. My issue is that I can't find a function in the agent that allows me to mute or pause VAD, or at least make it wait until I manually commit the conversation to the server. The transport of the button click, etc., is already working perfectly via the 'data_received' event.
In short, VAD should wait for my command until the user has finished speaking, and only then should the agent start processing or transmitting the speech. Essentially, I need a simple button that manually commits the spoken input. Does anyone have any ideas on the best way to approach this? Perhaps I need the assistant to wait for a commit or something similar?
Let me know if you need further adjustments!
Have you found any solution yet? We are also looking for the same!
Hi, I was able to mimick this by using before_llm_cb, Blocking all requests indefinitely until a variable sets true (which I am getting using Livekit data channels).. here are some helper codes;
async def manual_mode_cb(
agent: VoiceAssistant, chat_ctx: llm.ChatContext
) -> LLMStream:
# If we are in manual mode then we need to check the context
if (not event_dispatcher.is_manual_control):
return # Do not block the assistant
# If user not commit allow then async block indefinitely
while not event_dispatcher.is_user_allow_commit():
await asyncio.sleep(1)
return
Event dispatcher is my custom class using data channels to maintain live communications
i thought before_llm_cb is not async function
@hari01584 : Can you give me more of your code, so I can implement into my one? where do I set "before_llm_cb" ? thank you.
Hey @ChrisFeldmeier, I have a solution that you can try.
class EventUIHandler:
def __init__(self, room: Room):
self._listen_event = False
room.on("data_received", self.on_data_received)
def on_data_received(self,data: DataPacket):
topic = "ui:button_commit_voice"
if data.topic == topic:
payload = data.data.decode("utf-8")
if payload == "True":
self._listen_event = True
return
self._listen_event = False
@property
def listen_event(self):
return self._listen_event
def llm_cb_factory(event_ui_listener: EventUIHandler):
assert event_ui_listener is not None
async def fn(assistant: VoiceAssistant, chat_ctx: ChatContext):
content = []
while event_ui_listener.listen_event:
print("event_ui_listener.listen_event")
print("waiting for event")
await asyncio.sleep(1)
# Do chat or any processing chat context here
return assistant.llm.chat(chat_ctx=chat_ctx, fnc_ctx=assistant.fnc_ctx)
async def entrypoint(ctx: JobContext):
...
ui_listener = EventUIHandler(ctx.room)
llm_cb = llm_cb_factory(ui_listener)
assistant = VoiceAssistant(
vad=silero_vad,
stt=stt,
llm=gpt,
tts=tts,
chat_ctx=chat_context,
before_llm_cb=llm_cb,
before_tts_cb=before_tts
)
...