Shaoting

Results 5 issues of Shaoting

This PR extends the performance benchmark to include both v0 and v1. The latency, throughput, and fixed-QPS serving tests will first run with v0 and then with v1. The results...

perf-benchmarks
ready
ci/build

**Describe the bug** When there is cache hit, LMCache can load the KV Cache of the hit tokens. But then LMCache will try to store the KV Cache of them...

bug
good first issue

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...

### Proposal to improve performance _No response_ ### Report of performance regression With the following commands: ```bash python benchmark_long_document_qa_throughput.py \ --model meta-llama/Llama-3.2-1B-Instruct ``` The first round (no cache hit) performance...

performance

**What this PR does / why we need it**: **Special notes for your reviewers**: **If applicable**: - [ ] this PR contains user facing changes - docs added - [...

full