[PD Disaggregation] decode use cpu buffer to receive cache from prefill

Open juncaipeng opened this issue 2 months ago • 1 comments

Motivation

decode use cpu buffer to receive cache from prefill

add create_pinned_shm and open_pinned_shm
cache_messager and cache_transfer_manager support splitwise cpu cache buffer
resource_manager_v1 and prefix_cache_manager.py support splitwise cpu cache buffer

Decode can set --splitwise-cache-buffer-size 10 args

Refer to unittest

[x] Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
[x] Format your code, run pre-commit before commit.
[x] Add unit tests. Please write the reason in this PR if no unit tests.
[x] Provide accuracy results.
[x] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Nov 25 '25 07:11 juncaipeng

Thanks for your contribution!

Nov 25 '25 07:11 paddle-bot[bot]