CuiBo
CuiBo
@shisoft
我想给代码的后端加入cuda实现,请问可以吗?想与社区讨论一下接口和类的设计方案
想尝试一下帮社区添加GPU支持,请问是否能提供一下大致的思路
I agree with your viewpoint. However, retaining comments can help readers better understand the logic of this code,I am available to assist you in either refining the comments or deleting...
Confused +1
I am working on it too. I found that the function` AttnFuncWithCPAndQKVOA2A` can support context parallelism for mla? Is my conclusion correct, and what are the main reasons currently preventing...
I think we can support MLA+CP in P2P by padding the v value, which ensures minimal modifications to the original code. I am currently attempting to use this method.
you don't answer my question