llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

openai API `n` argument is ignored

Open BenjaminMarechalEVITECH opened this issue 1 year ago • 1 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [X] I carefully followed the README.md.
  • [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

I'm running llama-server with following command:

python3 -m llama_cpp.server --model models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf --clip_model_path models/mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf --model_alias llava-v1.5-13b-q4_k --chat_format llava-1-5 --port 10322

(models downloaded from https://huggingface.co/mys/ggml_llava-v1.5-13b/tree/main)

When I call the server using openai python package:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:10322/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-no-key-required"
)

chat_completion = client.chat.completions.create(
    model="models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf",
    messages=[
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ],
    n=2,
)
print(len(chat_completion.choices))  # Return 1, but should be 2.

According to OpenAI API, n argument is : "How many chat completion choices to generate for each input message." It's seems that it's ignored by the server.

Environment and Context

llama_cpp installed with pip install llama-cpp-python[server] print(llama_cpp.__version__): 0.3.6 print(openai.__version__): 1.59.7

BenjaminMarechalEVITECH avatar Jan 24 '25 20:01 BenjaminMarechalEVITECH

Hi, if you check the code at llama_cpp/server/app.py, some parameters are explicitly excluded:

...
exclude = {
    "n",
    "logit_bias_type",
    "user",
    "min_tokens",
}
kwargs = body.model_dump(exclude=exclude)

I think the reason is that llama.cpp doesn't support it yet.

DanieleMorotti avatar Mar 04 '25 08:03 DanieleMorotti