Stary comments

Results 18 comments of


                                            Stary

[WIP]chore: restructure documentation directory

> I'm not sure if it's because this is still a work in progress, but in the current PR it looks like the doc files were simply moved into the...

[WIP]chore: restructure documentation directory

> Another question. Will we continue to maintain `doc/zh` in the future? I think we just retain a few necessary documents in chinese is enough.

feat: add PCIe Relaxed Ordering (RO) support and RDMA traffic class (…

When IBV_ACCESS_RELAXED_ORDERING is set, RDMA write-after-write message order is no longer guaranteed, I'm not sure if it is not impact on us.

feat: add PCIe Relaxed Ordering (RO) support and RDMA traffic class (…

> > When IBV_ACCESS_RELAXED_ORDERING is set, RDMA write-after-write message order is no longer guaranteed, I'm not sure if it is not impact on us. > > @staryxchen Based on our...

[RoadMap] Mooncake Transfer Engine NEXT

Hi @alogfans Can we support multiple transport within a single transfer engine? This would enable us to select the optimal transport for different type transfer task. For example, if a...

[Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?

> Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get...

[Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?

> > > Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for...

[Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?

@stmatengss I have implemented the functionality to report task completion delay distributions in the PR #1130 , PTAL

[Bug]: Uneven Network Utilization in Multi-Node Prefill: Only Ray Head Node Transmits KV Cache

Hi @JayFzh I'm not very familiar with Ray, but Mooncake transfer engine doesn't care whether vllm uses MP or Ray and support P2P transfer between any node. I recommend confirming...

[Documentation]: What's the difference between doc and docs?

I have an idea: to reduce the effort of writing documents in both Chinese and English, we could introduce AI to assist with synchronization. After that, when updates are needed,...