Rain Jiang
Rain Jiang
I made a tutorial to explain how to integrate the LMCache with vLLM to support cross server PD disaggregation. https://github.com/bytedance-iaas/splitwise-demos . This demo can be expand to nPmD and independently...
Confirmed https://github.com/sgl-project/sglang/pull/6233 fix the issue. ``` Traffic request rate: 5.0 Burstiness factor: 1.0 (Poisson process) Maximum request concurrency: 236 100%|██████████████████████████████████████████████████████| 944/944 [07:16
Is there any plan for Cute DSL support on following kernels ? 1. RoPE + Set KvCache FP8/FP4 2. Fuse Attention on FP8/FP4 3. General MoE on FP8/FP4 4. Fuse...
> > Is there any plan for Cute DSL support on following kernels ? > > [@rainj-me](https://github.com/rainj-me) when choosing how to implement kernels in FlashInfer, we consider a number of...
This commit is trying to transfer the hiddenstate from prefill to decode via LMCache (with remote kv cache storage) within following diagram ``` +----------------+ +----------------+ | | | | |...
> @chenqianfzh @rainj-me Just curious, how much overhead will it introduce if we do not save KV cache but let decoding instance to decode 1 token The problem is the...
> @chenqianfzh Thanks for your kindly help, this is very useful for me, and I have run successfully vLLM+LMCache DP disagg by your document, thanks so much! > > BTW,...
> @chenqianfzh Thanks for your help, so `LMCache` still cannot work with deepseek with MLA? Is there a way to let `LMCache` support MLA ? @chenqianfzh and me are working...
> @chenqianfzh When I disable MLA, the log still exist, maybe we should log this ERROR only when enable MLA? > > ``` > ERROR LMCache: Failed to retrieve the...
Add a new commit to fix the unnecessary hidden states store/retrieve.