QQQ icon indicating copy to clipboard operation
QQQ copied to clipboard

关于Marlin fetch_to_registers的问题

Open darrenearl opened this issue 1 year ago • 0 comments

作者你好,我在看您的Marlin w4a8代码中,fetch_to_registers函数中有一句: int4* sh_s3_stage = sh_s3 + s3_sh_stage * ((group_blocks / thread_k_blocks) * (pipe / (group_blocks / thread_k_blocks))); 这行代码,我发现pipe=0和1的时候,偏移为0,pipe=2和3的时候,偏移为64,这里为什么是64而不是32呢? 我令m = 16,k = 512, n = 256, group_size = 128,设置矩阵乘法thread_k = 64, thread_n = 256.

darrenearl avatar Oct 09 '24 12:10 darrenearl