llmaz icon indicating copy to clipboard operation
llmaz copied to clipboard

Benchmark toolkit support

Open kerthcet opened this issue 1 year ago • 7 comments

What would you like to be added:

It would be super great to support benchmarking the LLM throughputs or latencies with different backends.

Why is this needed:

Provide proofs for users.

Completion requirements:

This enhancement requires the following artifacts:

  • [x] Design doc
  • [ ] API change
  • [x] Docs update

The artifacts should be linked in subsequent comments.

kerthcet avatar Aug 06 '24 03:08 kerthcet

/kind feature

kerthcet avatar Aug 06 '24 03:08 kerthcet

An example would looks like:

{
  metadata: {
    name: llama3-405b-2024-07-01,
    namespace: llm,
  },
  spec: {
    endpoint: llm-1.svc.local,
    port: 8000, 
    performance: {
      traffic-shape: {
        req-rate: 10 qps,
        model-type: instruction-tuned-llm/diffusion,
        dataset: share-gpt,
        input-length: 1024,
        max-output-length: 1024,
        total-prompts: 1000,
        traffic-spike: {
          burst: 10m,
          req-rate: 20 qps,
        }
      }
    }
  },
  status: {
    status: success,
    results: gcs-bucket-1/llama3-405b-2024-07-01,
  }
}

Inspired by https://docs.google.com/document/d/1k4Q4X14hW4vftElIuYGDu5KDe2LtV1XammoG-Xi3bbQ/edit

kerthcet avatar Aug 08 '24 11:08 kerthcet

Also see:

  • https://github.com/ray-project/llmperf
  • https://github.com/run-ai/llmperf
  • https://github.com/kubernetes-sigs/inference-perf

kerthcet avatar Sep 10 '24 00:09 kerthcet

/help

kerthcet avatar Apr 23 '25 02:04 kerthcet

We have gateway right now, I think we can push this forward.

kerthcet avatar Apr 23 '25 02:04 kerthcet

/assign

Here are some more references:

I think we have to clarify the main target and the scope of our benchmarking tool first, since there are many existing tools for benchmarking LLM inference services, with the rapid growth of the LLM + Kubernetes community. For example, llm-d leverages fmperf-project/fmperf to make their benchmark tool suite, and SGLang's OME defines a CRD called BenchmarkJob to run their genai-bench for benchmarking with fine-grained parameters.

rudeigerc avatar Jul 01 '25 05:07 rudeigerc

Thanks @rudeigerc I think what you concern about makes sense, I'll update the description later and have a discussion with you if possible.

kerthcet avatar Jul 03 '25 16:07 kerthcet