Rain Jiang

Results 11 comments of Rain Jiang

I made a tutorial to explain how to integrate the LMCache with vLLM to support cross server PD disaggregation. https://github.com/bytedance-iaas/splitwise-demos . This demo can be expand to nPmD and independently...

Confirmed https://github.com/sgl-project/sglang/pull/6233 fix the issue. ``` Traffic request rate: 5.0 Burstiness factor: 1.0 (Poisson process) Maximum request concurrency: 236 100%|██████████████████████████████████████████████████████| 944/944 [07:16

Is there any plan for Cute DSL support on following kernels ? 1. RoPE + Set KvCache FP8/FP4 2. Fuse Attention on FP8/FP4 3. General MoE on FP8/FP4 4. Fuse...

> > Is there any plan for Cute DSL support on following kernels ? > > [@rainj-me](https://github.com/rainj-me) when choosing how to implement kernels in FlashInfer, we consider a number of...

This commit is trying to transfer the hiddenstate from prefill to decode via LMCache (with remote kv cache storage) within following diagram ``` +----------------+ +----------------+ | | | | |...

> @chenqianfzh @rainj-me Just curious, how much overhead will it introduce if we do not save KV cache but let decoding instance to decode 1 token The problem is the...

> @chenqianfzh Thanks for your kindly help, this is very useful for me, and I have run successfully vLLM+LMCache DP disagg by your document, thanks so much! > > BTW,...

> @chenqianfzh Thanks for your help, so `LMCache` still cannot work with deepseek with MLA? Is there a way to let `LMCache` support MLA ? @chenqianfzh and me are working...

> @chenqianfzh When I disable MLA, the log still exist, maybe we should log this ERROR only when enable MLA? > > ``` > ERROR LMCache: Failed to retrieve the...

Add a new commit to fix the unnecessary hidden states store/retrieve.