llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Misc. bug: When using streaming output, if stream_options={"include_usage": True} is not set, the returned result should not include usage stats

Open allenz92 opened this issue 1 year ago • 0 comments

Name and Version

version: 4658 (855cd073) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

/app/llama-server -ngl 999  --metrics -m /data/model/DeepSeek-V3-Q4_K_M.gguf --port 8000 --host 0.0.0.0 --ctx-size 32768 --n-predict 4096  --batch-size 1024  --log-file /var/log/run.log -a DeepSeek-V3 --parallel 32

Problem description & steps to reproduce

related document:

  • https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156
  • https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

curl command:

curl --request POST \
  --url http://localhost:8000/v1/chat/completions \
  --data '{
  "model": "deepseek-ai/DeepSeek-V3",
  "messages": [
    {
      "role": "user",
      "content": "hello"
    }
  ],
  "max_tokens": 5,
  "temperature": 0.7,
  "top_p": 0.9,
  "n": 1,
  "stream":true,
  "stop": ["\n"]
}'

request body:

{
  "model": "deepseek-ai/DeepSeek-V3",
  "messages": [
    {
      "role": "user",
      "content": "hello"
    }
  ],
  "max_tokens": 5,
  "temperature": 0.7,
  "top_p": 0.9,
  "n": 1,
  "stream":true,
  "stop": ["\n"]
}

response:

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"Hello"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"!"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" How"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" can"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" I"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":"length","index":0,"delta":{}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk","usage":{"completion_tokens":5,"prompt_tokens":4,"total_tokens":9},"timings":{"prompt_n":2,"prompt_ms":94.377,"prompt_per_token_ms":47.1885,"prompt_per_second":21.191603886540154,"predicted_n":5,"predicted_ms":211.324,"predicted_per_token_ms":42.2648,"predicted_per_second":23.660350930324995}}

data: [DONE]
Image

First Bad Commit

No response

Relevant log output


allenz92 avatar Feb 28 '25 05:02 allenz92