llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Feature request: add support for streaming tool use

Open lsorber opened this issue 1 year ago • 3 comments

The combination stream=True, tool_choice="auto" raises an exception right now, which means that developers are stuck with one of two unfortunate choices:

  1. Developing an application that streams the response but cannot use tools
  2. Developing an LLM application that can use tools but cannot stream the response

Relevant discussion: https://github.com/abetlen/llama-cpp-python/discussions/1615

lsorber avatar Dec 25 '24 23:12 lsorber

Admittedly this is the wrong place to ask this question but as a beginner I feel like you're the right person to answer:

Does something need to be done to llama.cpp directly in order to handle streaming tool calling? I see from your feature branch that you added a RAG layer to this python implementation. I ask because I built llamma.cpp from source figuring it would be better optimized for my system, but I am stuck with this server error {"code":500,"message":"Cannot use tools with stream","type":"server_error"}.

Is it the case that if I installed the pre-built python version that this would go away?

Edit: I see here that there's a PR in draft. We're too close to the bleeding edge!

SaymV avatar Apr 22 '25 21:04 SaymV

Llama.cpp is still waiting on https://github.com/ggml-org/llama.cpp/pull/12379

I'm not sure how this python library handles tools. I think it is somewhat different though.

edmcman avatar Apr 22 '25 21:04 edmcman

https://github.com/ggml-org/llama.cpp/pull/12379 has been merged!

kooshi avatar May 25 '25 17:05 kooshi