agents Enhanced control over instructions and chat context in LLM interaction

Feature Type

Would make my life easier

Feature Description

With current LiveKit Agents, it does not seem to be possible to adjust instructions and chat context passed to LLM per specific LLM request.

There is the livekit.agents.voice.agent_session.AgentSession.generate_reply method that allows for ad-hoc instructions adjustments, but it has certain limitations:

It only allows to append to existing agent instructions, not replace them.
It only allows to append a single message to existing chat context, but does not allow to replace it.

I have found it very cumbersome for complex agentic scenarios, where fine control over agent responses is needed down a lenghty conversation. Some of problems I encountered due to current limitations are:

LLM models tend to ignore tool_choice='none' in case there are multiple tool calls in chat_ctx, which generates UNEXPECTED_TOOL_CALL_ERROR in Gemini.
LLM models tend to ignore instructions appended during generate_reply when conversation follows a certain direction and the intention of generate_reply is to change this direction, for istance "Ask for consent, register it via function call but do not ask any further questions.". The "no further questions" part has a great chance to be ignored if there is a q-a sequence in chat_ctx.

The solution for instruction control is to use multiple agents but it does not solve the problem of chat_ctx control as not passing chat_ctx between agents prevent agents down the line to use chat_ctx, in case they need it.

I propose to extend livekit.agents.voice.agent_session.AgentSession.generate_reply to enable the functionality requested: Enable to replace instructions and chat_ctx for the duration of the generate_reply call.

I am happy to implement it as I already use the modified version in my project.

Let me know if this makes sense.

Workarounds / Alternatives

I considered multiple agents, but it prevented chat_ctx control and introduced unnecessary complexity.
I used generate_reply as it is, but faced the problems described above.
I copied and modified generate_reply to my base agent implementation. It works as intended but it is copy-pasting of a sizeable amount of code from livekit agents.

Additional Context

No response

Nov 07 '25 11:11 mrkowalski

Hi, could you share a bit more on your workaround for generate_reply() and how you would extend it? Do you swap out the instructions and context for the call and add the LLM message back as well?

The solution for instruction control is to use multiple agents but it does not solve the problem of chat_ctx control as not passing chat_ctx between agents prevent agents down the line to use chat_ctx, in case they need it.

Also for this, I believe you can store chat_ctx in userdata for future agents to access

Nov 07 '25 18:11 tinalenguyen

Hi, could you share a bit more on your workaround for generate_reply() and how you would extend it? Do you swap out the instructions and context for the call and add the LLM message back as well?

My version of generate_reply takes instructions and user_message as input. It has a flag that allows to choose between appending user_message to existing chat_ctx vs using user_message as the whole chat_ctx. instructions and chat_ctx are changed only for the duration of the call. The rest is the same as original, so the whole conversation ends up in chat_ctx.

It just gives more control.

As for userdata - this adds complexity, requires concatenation of chat_ctx's and would effectively be a workaround for limitations of generate_reply that seems intended for the very purpose, but limited.

Nov 07 '25 21:11 mrkowalski

@mrkowalski We're not sure if allowing that much fine-grained control for generate_reply() is in our roadmap, but would your use case be similar to this one where you only need control over one LLM call? Perhaps we can introduce another method to allow separate inferencing that wouldn't interfere with the main conversation flow.

As for recent instructions in generate_reply() not having as much weight, what do you think of collapsing (summarizing) the chat context so that recent interactions have more weight? This would be done within the framework, say for every 50 (arbitrary number, could be set by the user) chat messages, the context gets summarized.

Nov 14 '25 21:11 tinalenguyen