能够描述一下 每个指标的含义,有几个指标不太懂什么意思
Benchmarking summary: Time taken for tests: 22.512 seconds Expected number of requests: 100 Number of concurrency: 128 Total requests: 100 Succeed requests: 100 Failed requests: 0 Average QPS: 4.442 Average latency: 14.140 Throughput(average output tokens per second): 891.275 Average time to first token: 2.701 Average input tokens per request: 28.890 Average output tokens per request: 200.640 Average time per output token: 0.00112 Average package per request: 191.830 Average package latency: 0.060 Percentile of time to first token: p50: 2.7137 p66: 2.7370 p75: 2.7879 p80: 2.8042 p90: 2.8816 p95: 2.9215 p98: 2.9364 p99: 2.9847 Percentile of request latency: p50: 14.7637 p66: 17.0512 p75: 17.7525 p80: 18.3740 p90: 19.7777 p95: 20.1707 p98: 21.1066 p99: 22.5016
ok 我们补充一下
参考这里: https://github.com/modelscope/evalscope/tree/main/evalscope/perf