Shaoting issues

Results 5 issues of


                                            Shaoting

[CI/Build][v1] vLLM v1 automatic benchmarking

This PR extends the performance benchmark to include both v0 and v1. The latency, throughput, and fixed-QPS serving tests will first run with v0 and then with v1. The results...

perf-benchmarks

ready

ci/build

[Bug] LMCache always tries to store all tokens even if they are stored and loaded

**Describe the bug** When there is cache hit, LMCache can load the KV Cache of the hit tokens. But then LMCache will try to store the KV Cache of them...

bug

good first issue

[Benchmark] Update data_preprocessing.py

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...

[Performance]: Prefill TTFT and latency both increased

### Proposal to improve performance _No response_ ### Report of performance regression With the following commands: ```bash python benchmark_long_document_qa_throughput.py \ --model meta-llama/Llama-3.2-1B-Instruct ``` The first round (no cache hit) performance...

performance

[Doc] Fix local disk URI scheme in documentation

**What this PR does / why we need it**: **Special notes for your reviewers**: **If applicable**: - [ ] this PR contains user facing changes - docs added - [...

full