FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

[PD Disaggregation] decode use cpu buffer to receive cache from prefill

Open juncaipeng opened this issue 2 months ago • 1 comments

Motivation

decode use cpu buffer to receive cache from prefill

Modifications

  • add create_pinned_shm and open_pinned_shm
  • cache_messager and cache_transfer_manager support splitwise cpu cache buffer
  • resource_manager_v1 and prefix_cache_manager.py support splitwise cpu cache buffer

Usage or Command

Decode can set --splitwise-cache-buffer-size 10 args

Accuracy Tests

Refer to unittest

Checklist

  • [x] Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • [x] Format your code, run pre-commit before commit.
  • [x] Add unit tests. Please write the reason in this PR if no unit tests.
  • [x] Provide accuracy results.
  • [x] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

juncaipeng avatar Nov 25 '25 07:11 juncaipeng

Thanks for your contribution!

paddle-bot[bot] avatar Nov 25 '25 07:11 paddle-bot[bot]