llama.cpp
llama.cpp copied to clipboard
Misc. bug: When using streaming output, if stream_options={"include_usage": True} is not set, the returned result should not include usage stats
Name and Version
version: 4658 (855cd073) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
/app/llama-server -ngl 999 --metrics -m /data/model/DeepSeek-V3-Q4_K_M.gguf --port 8000 --host 0.0.0.0 --ctx-size 32768 --n-predict 4096 --batch-size 1024 --log-file /var/log/run.log -a DeepSeek-V3 --parallel 32
Problem description & steps to reproduce
related document:
- https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156
- https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options
curl command:
curl --request POST \
--url http://localhost:8000/v1/chat/completions \
--data '{
"model": "deepseek-ai/DeepSeek-V3",
"messages": [
{
"role": "user",
"content": "hello"
}
],
"max_tokens": 5,
"temperature": 0.7,
"top_p": 0.9,
"n": 1,
"stream":true,
"stop": ["\n"]
}'
request body:
{
"model": "deepseek-ai/DeepSeek-V3",
"messages": [
{
"role": "user",
"content": "hello"
}
],
"max_tokens": 5,
"temperature": 0.7,
"top_p": 0.9,
"n": 1,
"stream":true,
"stop": ["\n"]
}
response:
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"Hello"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"!"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" How"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" can"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" I"}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":"length","index":0,"delta":{}}],"created":1740721736,"id":"chatcmpl-uIjo6Xo5CDClL7yq219AAkd9xFk4SMsd","model":"deepseek-ai/DeepSeek-V3","system_fingerprint":"b4658-855cd073","object":"chat.completion.chunk","usage":{"completion_tokens":5,"prompt_tokens":4,"total_tokens":9},"timings":{"prompt_n":2,"prompt_ms":94.377,"prompt_per_token_ms":47.1885,"prompt_per_second":21.191603886540154,"predicted_n":5,"predicted_ms":211.324,"predicted_per_token_ms":42.2648,"predicted_per_second":23.660350930324995}}
data: [DONE]
First Bad Commit
No response
Relevant log output