Wu Jiawei
Wu Jiawei
> > > What changes will occur in the end-to-end latency of each RANK? Can it be estimated as max(Dispatch latency) + Expert Group Gemm latency + max(Combine latency)? >...
> > Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks! > > Hello, I haven't found a good solution...
> As can be seen, when using 9 warpgroup - ie few SMs, the performance only slightly slow down. Thus this makes a simple overlapping between this and computation feasible....
> Deepep kernel uses less SM, so who will use the extra SM? For example, in the decode phase of sglang, the communication kernel and the compute kernel are serial....
> I discussed this issue with [@liuhe-spec](https://github.com/liuhe-spec) on WeChat, and we strongly suspect it is likely related to RoCE network congestion control. > > If possible, you can ask your...
Hi, to achieve Parameters for Overlap, should we modify in sglang forward and launch multi time gemm( to caculate different expert group ) ?