Hello_World

Results 18 comments of Hello_World

Hi, I guess this is because we only generate device code for SM60 (a.k.a Pascal arch). I have fixed this in https://github.com/MegviiRobot/MegBA/pull/27, but I am not sure this will work...

Sorry for the late reply, I agree the feature you mentioned is very important. Since on GPU, the data transferring is of high latency, all the data layouts are predetermined...

Hi KBentley57, Thanks for your comments! We indeed are planning to integrate MegBA into Colmap and we are very glad to invite you to participate in MegBA. What kind of...

Hi Kyle, So glad to hear that so many people would like to join the project, we think we could make the project robust and powerful together. I agree with...

感谢耐心回复。 从实现来说通过个人更建议CUTLASS的层面给予相关篇幅介绍。因为如文章所说cuBlas这种库的灵活性极差且在corner case下性能较差;而wmma指令又关注的过于细节,导致容易实现过程首尾不相顾或缺乏代码抽象增加编程难度。实际实践中通常用CUTLASS这种既提供抽象封装不需要过度关注每一行的代码实现,又有足够灵活性(指支持算子融合及定制化算子如depthwise conv)与高性能(指能实现与cuBlas相接近的性能)的库开发算子。 Best regards, Jie

It is about finding an `inv_layout` of `layout`, which satisfies `layout(inv_layout(i)) = i`

Thanks for your comment! Currently, I am fully focusing on designing sparse solvers to beat solvers like CHOLMOD. So the colmap integration plan is still on my todo list :-(...

Opened pr: https://github.com/pghysels/STRUMPACK/pull/112

I believe the CUTLASS version is for something like initialising. Check the following global loading code snippet: ```cuda __device__ void global_load(uint4 &D, void const *ptr, bool pred_guard) { uint4 &data...