Shaoting
Shaoting
This PR extends the performance benchmark to include both v0 and v1. The latency, throughput, and fixed-QPS serving tests will first run with v0 and then with v1. The results...
**Describe the bug** When there is cache hit, LMCache can load the KV Cache of the hit tokens. But then LMCache will try to store the KV Cache of them...
FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...
### Proposal to improve performance _No response_ ### Report of performance regression With the following commands: ```bash python benchmark_long_document_qa_throughput.py \ --model meta-llama/Llama-3.2-1B-Instruct ``` The first round (no cache hit) performance...
**What this PR does / why we need it**: **Special notes for your reviewers**: **If applicable**: - [ ] this PR contains user facing changes - docs added - [...