LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

Metrics API Endpoint Error: * collected metric "api_call" { [...] } was collected before with the same name and label values

Open countzero opened this issue 2 years ago • 10 comments

LocalAI version: 7641f92

Environment, CPU architecture, OS, and Version:

Linux ... 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug The API endpoint /metrics works for some time and then after some completion requests it fails with an error message.

To Reproduce

  1. Start LocalAI
  2. Make some Requests to the /v1/chat/completions API Endpoint
  3. Check the /metrics API Endpoint

Expected behavior The /metrics API Endpoint should be robust.

Logs

An error has occurred while serving metrics:

2 error(s) occurred:

  • collected metric "api_call" { label:{name:"method" value:"GETT"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/v1/chat/completions"} histogram:{sample_count:2 sample_sum:7358.754251354 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:0 upper_bound:5} bucket:{cumulative_count:0 upper_bound:10} bucket:{cumulative_count:0 upper_bound:25} bucket:{cumulative_count:0 upper_bound:50} bucket:{cumulative_count:0 upper_bound:75} bucket:{cumulative_count:0 upper_bound:100} bucket:{cumulative_count:0 upper_bound:250} bucket:{cumulative_count:0 upper_bound:500} bucket:{cumulative_count:0 upper_bound:750} bucket:{cumulative_count:0 upper_bound:1000} bucket:{cumulative_count:0 upper_bound:2500} bucket:{cumulative_count:2 upper_bound:5000} bucket:{cumulative_count:2 upper_bound:7500} bucket:{cumulative_count:2 upper_bound:10000}}} was collected before with the same name and label values
  • collected metric "api_call" { label:{name:"method" value:"GETT"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/v1/chat/completions"} histogram:{sample_count:1 sample_sum:18.822380547 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:0 upper_bound:5} bucket:{cumulative_count:0 upper_bound:10} bucket:{cumulative_count:1 upper_bound:25} bucket:{cumulative_count:1 upper_bound:50} bucket:{cumulative_count:1 upper_bound:75} bucket:{cumulative_count:1 upper_bound:100} bucket:{cumulative_count:1 upper_bound:250} bucket:{cumulative_count:1 upper_bound:500} bucket:{cumulative_count:1 upper_bound:750} bucket:{cumulative_count:1 upper_bound:1000} bucket:{cumulative_count:1 upper_bound:2500} bucket:{cumulative_count:1 upper_bound:5

Additional context

We are using llama.cpp as a backend and enabled parallel requests:

PARALLEL_REQUESTS=true
LLAMACPP_PARALLEL=10

A benchmark script to produce some load on the system:

Measure-Command { 
    1..10 | % { `
        Start-Job -ScriptBlock { `
            curl.exe [...]/v1/chat/completions `
                --header "Content-Type: application/json" `
                --header "@${HOME}\authorization_header.txt" `
                --data '{
                    \"model\": \"dolphin-2_2-yi-34b.Q4_K_M\",
                    \"messages\": [
                        {
                            \"role\": \"system\",
                            \"content\": \"You are a helpfull assistant.\"
                        },
                        {
                            \"role\": \"user\",
                            \"content\": \"How are you?\"
                        }
                    ],
                    \"temperature\": 0.7
                }'
        }
    }
    Get-Job | Wait-Job | Receive-Job | Out-Host 
}

countzero avatar Dec 15 '23 12:12 countzero

Error is still reproducable with https://github.com/mudler/LocalAI/releases/tag/v2.14.0

curl -v https://[...]/metrics

HTTP/1.1 500 Internal Server Error Alt-Svc: h3=":443"; ma=2592000 Content-Length: 5961 Content-Type: text/plain; charset=utf-8 Date: Tue, 07 May 2024 11:46:26 GMT Server: Caddy X-Content-Type-Options: nosniff

An error has occurred while serving metrics:

6 error(s) occurred:

  • collected metric "api_call" { label:{name:"method" value:"GET"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/readyz"} histogram:{sample_count:3 sample_sum:8.935999999999999e-06 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:3 upper_bound:5} bucket:{cumulative_count:3 upper_bound:10} bucket:{cumulative_count:3 upper_bound:25} bucket:{cumulative_count:3 upper_bound:50} bucket:{cumulative_count:3 upper_bound:75} bucket:{cumulative_count:3 upper_bound:100} bucket:{cumulative_count:3 upper_bound:250} bucket:{cumulative_count:3 upper_bound:500} bucket:{cumulative_count:3 upper_bound:750} bucket:{cumulative_count:3 upper_bound:1000} bucket:{cumulative_count:3 upper_bound:2500} bucket:{cumulative_count:3 upper_bound:5000} bucket:{cumulative_count:3 upper_bound:7500} bucket:{cumulative_count:3 upper_bound:10000}}} was collected before with the same name and label values
  • collected metric "api_call" { label:{name:"method" value:"GET"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/wp-config."} histogram:{sample_count:1 sample_sum:3.6339e-05 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:1 upper_bound:5} bucket:{cumulative_count:1 upper_bound:10} bucket:{cumulative_count:1 upper_bound:25} bucket:{cumulative_count:1 upper_bound:50} bucket:{cumulative_count:1 upper_bound:75} bucket:{cumulative_count:1 upper_bound:100} bucket:{cumulative_count:1 upper_bound:250} bucket:{cumulative_count:1 upper_bound:500} bucket:{cumulative_count:1 upper_bound:750} bucket:{cumulative_count:1 upper_bound:1000} bucket:{cumulative_count:1 upper_bound:2500} bucket:{cumulative_count:1 upper_bound:5000} bucket:{cumulative_count:1 upper_bound:7500} bucket:{cumulative_count:1 upper_bound:10000}}} was collected before with the same name and label values
  • collected metric "api_call" { label:{name:"method" value:"GET"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/readyzl/pri"} histogram:{sample_count:1 sample_sum:2.7692e-05 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:1 upper_bound:5} bucket:{cumulative_count:1 upper_bound:10} bucket:{cumulative_count:1 upper_bound:25} bucket:{cumulative_count:1 upper_bound:50} bucket:{cumulative_count:1 upper_bound:75} bucket:{cumulative_count:1 upper_bound:100} bucket:{cumulative_count:1 upper_bound:250} bucket:{cumulative_count:1 upper_bound:500} bucket:{cumulative_count:1 upper_bound:750} bucket:{cumulative_count:1 upper_bound:1000} bucket:{cumulative_count:1 upper_bound:2500} bucket:{cumulative_count:1 upper_bound:5000} bucket:{cumulative_count:1 upper_bound:7500} bucket:{cumulative_count:1 upper_bound:10000}}} was collected before with the same name and label values
  • collected metric "api_call" { label:{name:"method" value:"GET"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/read"} histogram:{sample_count:1 sample_sum:2.4356e-05 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:1 upper_bound:5} bucket:{cumulative_count:1 upper_bound:10} bucket:{cumulative_count:1 upper_bound:25} bucket:{cumulative_count:1 upper_bound:50} bucket:{cumulative_count:1 upper_bound:75} bucket:{cumulative_count:1 upper_bound:100} bucket:{cumulative_count:1 upper_bound:250} bucket:{cumulative_count:1 upper_bound:500} bucket:{cumulative_count:1 upper_bound:750} bucket:{cumulative_count:1 upper_bound:1000} bucket:{cumulative_count:1 upper_bound:2500} bucket:{cumulative_count:1 upper_bound:5000} bucket:{cumulative_count:1 upper_bound:7500} bucket:{cumulative_count:1 upper_bound:10000}}} was collected before with the same name and label values
  • collected metric "api_call" { label:{name:"method" value:"GET"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/wp-config."} histogram:{sample_count:1 sample_sum:2.5208e-05 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:1 upper_bound:5} bucket:{cumulative_count:1 upper_bound:10} bucket:{cumulative_count:1 upper_bound:25} bucket:{cumulative_count:1 upper_bound:50} bucket:{cumulative_count:1 upper_bound:75} bucket:{cumulative_count:1 upper_bound:100} bucket:{cumulative_count:1 upper_bound:250} bucket:{cumulative_count:1 upper_bound:500} bucket:{cumulative_count:1 upper_bound:750} bucket:{cumulative_count:1 upper_bound:1000} bucket:{cumulative_count:1 upper_bound:2500} bucket:{cumulative_count:1 upper_bound:5000} bucket:{cumulative_count:1 upper_bound:7500} bucket:{cumulative_count:1 upper_bound:10000}}} was collected before with the same name and label values
  • collected metric "api_call" { label:{name:"method" value:"GET"} label:{name:"otel_scope_name" value:"github.com/go-skynet/LocalAI"} label:{name:"otel_scope_version" value:""} label:{name:"path" value:"/wp-config."} histogram:{sample_count:1 sample_sum:2.2442e-05 bucket:{cumulative_count:0 upper_bound:0} bucket:{cumulative_count:1 upper_bound:5} bucket:{cumulative_count:1 upper_bound:10} bucket:{cumulative_count:1 upper_bound:25} bucket:{cumulative_count:1 upper_bound:50} bucket:{cumulative_count:1 upper_bound:75} bucket:{cumulative_count:1 upper_bound:100} bucket:{cumulative_count:1 upper_bound:250} bucket:{cumulative_count:1 upper_bound:500} bucket:{cumulative_count:1 upper_bound:750} bucket:{cumulative_count:1 upper_bound:1000} bucket:{cumulative_count:1 upper_bound:2500} bucket:{cumulative_count:1 upper_bound:5000} bucket:{cumulative_count:1 upper_bound:7500} bucket:{cumulative_count:1 upper_bound:10000}}} was collected before with the same name and label values

countzero avatar May 07 '24 11:05 countzero

I was running into this while adding a service monitor to the helm chart...

first it seems ok but after some load i get something like:

An error has occurred while serving metrics:

collected metric "api_call" { label:{name:"method"  value:"POST"}  label:{name:"otel_scope_name"  value:"github.com/go-skynet/LocalAI"}  label:{name:"otel_scope_version"  value:""}  label:{name:"path"  value:"/chat/completions"}  histogram:{sample_count:3  sample_sum:4.015941837  bucket:{cumulative_count:0  upper_bound:0}  bucket:{cumulative_count:3  upper_bound:5}  bucket:{cumulative_count:3  upper_bound:10}  bucket:{cumulative_count:3  upper_bound:25}  bucket:{cumulative_count:3  upper_bound:50}  bucket:{cumulative_count:3  upper_bound:75}  bucket:{cumulative_count:3  upper_bound:100}  bucket:{cumulative_count:3  upper_bound:250}  bucket:{cumulative_count:3  upper_bound:500}  bucket:{cumulative_count:3  upper_bound:750}  bucket:{cumulative_count:3  upper_bound:1000}  bucket:{cumulative_count:3  upper_bound:2500}  bucket:{cumulative_count:3  upper_bound:5000}  bucket:{cumulative_count:3  upper_bound:7500}  bucket:{cumulative_count:3  upper_bound:10000}}} was collected before with the same name and label values

Nold360 avatar May 27 '24 18:05 Nold360

I have the same Problem with the image localai/localai:v2.21.1-cublas-cuda12-core but only when using llama.cpp with parallel requests enabled. So the metrics get generated for every slot but with the same name and label.

TheVoidArbiter avatar Oct 01 '24 08:10 TheVoidArbiter