FastDeploy
FastDeploy copied to clipboard
[PD Disaggregation] decode use cpu buffer to receive cache from prefill
Motivation
decode use cpu buffer to receive cache from prefill
Modifications
- add create_pinned_shm and open_pinned_shm
- cache_messager and cache_transfer_manager support splitwise cpu cache buffer
- resource_manager_v1 and prefix_cache_manager.py support splitwise cpu cache buffer
Usage or Command
Decode can set --splitwise-cache-buffer-size 10 args
Accuracy Tests
Refer to unittest
Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]] - You can add new tags based on the PR content, but the semantics must be clear.
- Tag list: [
- [x] Format your code, run
pre-commitbefore commit. - [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the
releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.
Thanks for your contribution!