llmaz Benchmark toolkit support

What would you like to be added:

It would be super great to support benchmarking the LLM throughputs or latencies with different backends.

Why is this needed:

Provide proofs for users.

Completion requirements:

This enhancement requires the following artifacts:

[x] Design doc
[ ] API change
[x] Docs update

The artifacts should be linked in subsequent comments.

Aug 06 '24 03:08 kerthcet

/kind feature

Aug 06 '24 03:08 kerthcet

An example would looks like:

{
  metadata: {
    name: llama3-405b-2024-07-01,
    namespace: llm,
  },
  spec: {
    endpoint: llm-1.svc.local,
    port: 8000, 
    performance: {
      traffic-shape: {
        req-rate: 10 qps,
        model-type: instruction-tuned-llm/diffusion,
        dataset: share-gpt,
        input-length: 1024,
        max-output-length: 1024,
        total-prompts: 1000,
        traffic-spike: {
          burst: 10m,
          req-rate: 20 qps,
        }
      }
    }
  },
  status: {
    status: success,
    results: gcs-bucket-1/llama3-405b-2024-07-01,
  }
}

Inspired by https://docs.google.com/document/d/1k4Q4X14hW4vftElIuYGDu5KDe2LtV1XammoG-Xi3bbQ/edit

Aug 08 '24 11:08 kerthcet

Also see:

https://github.com/ray-project/llmperf
https://github.com/run-ai/llmperf
https://github.com/kubernetes-sigs/inference-perf

Sep 10 '24 00:09 kerthcet

/help

Apr 23 '25 02:04 kerthcet

We have gateway right now, I think we can push this forward.

Apr 23 '25 02:04 kerthcet

/assign

Here are some more references:

I think we have to clarify the main target and the scope of our benchmarking tool first, since there are many existing tools for benchmarking LLM inference services, with the rapid growth of the LLM + Kubernetes community. For example, llm-d leverages fmperf-project/fmperf to make their benchmark tool suite, and SGLang's OME defines a CRD called BenchmarkJob to run their genai-bench for benchmarking with fine-grained parameters.

Jul 01 '25 05:07 rudeigerc

Thanks @rudeigerc I think what you concern about makes sense, I'll update the description later and have a discussion with you if possible.

Jul 03 '25 16:07 kerthcet