agents icon indicating copy to clipboard operation
agents copied to clipboard

Addition of @assistant.on("function_calls_started")

Open birksy89 opened this issue 1 year ago • 10 comments

Would it be possible to add something like: @assistant.on("function_calls_started")

We currently have @assistant.on("function_calls_finished")... But some of the functions which I'm calling take a couple of seconds to respond with data.

It would be great to be able to tap into a "starting" state, where I could then:

await assistant.say("Hmmm, let me just look that up for you - Give me 2 seconds.")

Any advice or input would be greatly appreciated

birksy89 avatar Jun 28 '24 13:06 birksy89

Hey, can function_calls_collected satisfy your needs?

theomonnom avatar Jul 26 '24 12:07 theomonnom

you could just instruct the llm to always return a spoken response along with the function call. this works the majority of the time for me but I've found there are also times when it ignores this instruction despite strong prompting.

or you could include a chat completion request in the body of the function call itself? particularly if some of your function calls require this "hold on" dialogue and some don't.

willsmanley avatar Jul 27 '24 06:07 willsmanley

I don't know if this helps you, but it helped me! This is the way I receive events from the agent server and send it via publish_data to the client app side.

See the full code in attached .txt file here: agent.txt

assistant = VoiceAssistant( vad=silero.VAD(), stt=deepgram.STT(language="de-DE"), llm=openai.LLM(model="gpt-4o"), tts=elevenlabs.TTS(voice=VOICE1, language="de"), chat_ctx=initial_ctx, allow_interruptions=True, interrupt_volume=0.20, interrupt_speech_duration=0.65, interrupt_min_words=1, base_volume=1.0, transcription_speed=1.83, ) # Start the assistant assistant.start(ctx.room) # Event handlers assistant.on('user_started_speaking', lambda: user_started_speaking(assistant, ctx.room.local_participant)) assistant.on('user_stopped_speaking', lambda: user_stopped_speaking(assistant, ctx.room.local_participant)) assistant.on('agent_started_speaking', lambda: agent_started_speaking(assistant, ctx.room.local_participant)) assistant.on('agent_stopped_speaking', lambda: agent_stopped_speaking(assistant, ctx.room.local_participant)) assistant.on('agent_speech_interrupted', lambda: agent_stopped_speaking(assistant, ctx.room.local_participant)) # Register data received handler ctx.room.on("data_received", lambda dp: asyncio.create_task(handle_data_received(dp, assistant))) # Register the stop_speaking event handler assistant.on('stop_speaking', lambda: stop_speaking(assistant))

ChrisFeldmeier avatar Sep 08 '24 20:09 ChrisFeldmeier

Would it be possible to add something like: @assistant.on("function_calls_started")

We currently have @assistant.on("function_calls_finished")... But some of the functions which I'm calling take a couple of seconds to respond with data.

It would be great to be able to tap into a "starting" state, where I could then:

await assistant.say("Hmmm, let me just look that up for you - Give me 2 seconds.")

Any advice or input would be greatly appreciated

I met the same scenario, have you solve it? any suggestions would be great. @birksy89

zamia avatar Oct 15 '24 19:10 zamia

@zamia I'm currently waiting for the Node version of the API to be stabilised a bit more. I don't know Python, and I think a lot of the friction I was getting may be partly down to that.

There's the new realtime API which is a different paradigm, and hopefully can solve this

birksy89 avatar Oct 16 '24 09:10 birksy89

Thanks @birksy89 I solve it by using LLM prompt and let LLM attach a text content along with function call. It is supported in the newest version of livekit-agents. I hope it helps you too.

zamia avatar Oct 16 '24 17:10 zamia

@zamia can you provide a snippet for example?

Does the LLM return the text before the function call is finished?

birksy89 avatar Oct 17 '24 10:10 birksy89

the response text is along side with the function call, they are in the same response. like this( this log is from claude-3.5-sonnet)

{'role': 'user', 'content': [{'text': 'Tell me the weather of Los Angeles.', 'type': 'text'}]}, 
{'role': 'assistant', 'content': [{'text': ' Sure thing! Let me check that for you. Hmm, just a moment please.', 'type': 'text'}, {'id': 'toolu_01Av6PofMJz1foNH8fyYe6bY', 'type': 'tool_use', 'name': 'get_weather', 'input': {'location': 'Los Angeles'}}]}

the system prompt is like this:

......some system prompt here....
Create a response that includes an interjection before calling the function to simulate processing time while the backend queries the API. The response along with the function call should be like 'ok, let me check on that, wait a moment please.'. Remember this response is along with the function call."

I also found gpt-4o-mini does not work properly, but gpt-4o and claude-3.5-sonnet works fine.

zamia avatar Oct 17 '24 20:10 zamia

@zamia thanks for the reply.

Yeah, this is what I ended up doing originally - But the response came always just before the function call.

So there was a gap, (while the function was processed) and then the "let me check" + function call response came together shortly after.

It was probably the model which was being used. I was trying these things very early in the release.

I personally still think there's a plan for the originally proposed:

@assistant.on("function_calls_started")

Hopefully one day someone comes along with a solid solution - Or the new realtime API handles this differently (I'm yet to dig into that too much yet)

birksy89 avatar Oct 18 '24 08:10 birksy89