agents icon indicating copy to clipboard operation
agents copied to clipboard

Implement Manual VAD Commit via Button for Controlled Speech Processing

Open ChrisFeldmeier opened this issue 1 year ago • 5 comments

I've implemented a button in the client that is supposed to ensure VAD (Voice Activity Detection) doesn't immediately commit my conversation and send it to the server. Instead, it should wait until I click the button again. My issue is that I can't find a function in the agent that allows me to mute or pause VAD, or at least make it wait until I manually commit the conversation to the server. The transport of the button click, etc., is already working perfectly via the 'data_received' event.

In short, VAD should wait for my command until the user has finished speaking, and only then should the agent start processing or transmitting the speech. Essentially, I need a simple button that manually commits the spoken input. Does anyone have any ideas on the best way to approach this? Perhaps I need the assistant to wait for a commit or something similar?

Let me know if you need further adjustments!

ChrisFeldmeier avatar Sep 15 '24 13:09 ChrisFeldmeier

Have you found any solution yet? We are also looking for the same!

hari01584 avatar Oct 03 '24 08:10 hari01584

Hi, I was able to mimick this by using before_llm_cb, Blocking all requests indefinitely until a variable sets true (which I am getting using Livekit data channels).. here are some helper codes;

    async def manual_mode_cb(
        agent: VoiceAssistant, chat_ctx: llm.ChatContext
    ) -> LLMStream:
        # If we are in manual mode then we need to check the context
        if (not event_dispatcher.is_manual_control):
            return # Do not block the assistant

        # If user not commit allow then async block indefinitely
        while not event_dispatcher.is_user_allow_commit():
            await asyncio.sleep(1)
        
        return

Event dispatcher is my custom class using data channels to maintain live communications

hari01584 avatar Oct 03 '24 09:10 hari01584

i thought before_llm_cb is not async function

fathrahh avatar Oct 11 '24 06:10 fathrahh

@hari01584 : Can you give me more of your code, so I can implement into my one? where do I set "before_llm_cb" ? thank you.

ChrisFeldmeier avatar Oct 11 '24 23:10 ChrisFeldmeier

Hey @ChrisFeldmeier, I have a solution that you can try.

class EventUIHandler:
    def __init__(self, room: Room):
        self._listen_event = False
        room.on("data_received", self.on_data_received)
        
    def on_data_received(self,data: DataPacket):
        topic = "ui:button_commit_voice"

        if data.topic == topic:
            payload = data.data.decode("utf-8")
            if payload == "True":
                self._listen_event = True
                return

            self._listen_event = False            
                
    @property
    def listen_event(self):
        return self._listen_event
        
def llm_cb_factory(event_ui_listener: EventUIHandler):    
  assert event_ui_listener is not None

  async def fn(assistant: VoiceAssistant, chat_ctx: ChatContext):
      content = []

      while event_ui_listener.listen_event:
          print("event_ui_listener.listen_event")
          print("waiting for event")
          
          await asyncio.sleep(1)
      
      # Do chat or any processing chat context here
      return assistant.llm.chat(chat_ctx=chat_ctx, fnc_ctx=assistant.fnc_ctx)
      
async def entrypoint(ctx: JobContext):
    ...
    ui_listener = EventUIHandler(ctx.room)
    llm_cb = llm_cb_factory(ui_listener)
    assistant = VoiceAssistant(
        vad=silero_vad,
        stt=stt,
        llm=gpt,
        tts=tts,
        chat_ctx=chat_context,
        before_llm_cb=llm_cb,
        before_tts_cb=before_tts
   )
   ...

fathrahh avatar Oct 14 '24 13:10 fathrahh