simdtutor icon indicating copy to clipboard operation
simdtutor copied to clipboard

请教小彭老师一下,这个函数怎么用avx2优化比较好

Open lin0ww0nil opened this issue 2 years ago • 4 comments

saunlesuanle

lin0ww0nil avatar Sep 14 '23 07:09 lin0ww0nil

for (int j = 0; j < height; j++) {
            dst[0] = col_0[height - 1 - j];
            if (2 == td) {
                dst[1] = col_1_td2[height - 1 - j];
            }

            if ((3 + 4 * j) < width) {
                memcpy(dst + td, ref_left + (4 * (height - 1) + rem_rl - 1) - (4 * j + rem_rl - 1), (rem_rl + 4 * j) * sizeof(s16));
                memcpy(dst + 3 + 4 * j, ref_above, (width - (3 + 4 * j)) * sizeof(s16));
            }
            else {
                // w - 3
                memcpy(dst + td, ref_left + (4 * (height - 1) + rem_rl - 1) - (4 * j + rem_rl - 1), (width - td) * sizeof(s16));
            }

            dst += i_dst;
        }

这里循环内都是的memcpy到dst + rd,确定是正确的?

archibate avatar Sep 17 '23 01:09 archibate

saunlesuanle

lin0ww0nil avatar Sep 18 '23 02:09 lin0ww0nil

是td。td是在这个for循环内是常量,为什么要多次重复拷贝进同一个dst + td?我的测试显示60%的时间花在这三个memcpy里。

archibate avatar Sep 18 '23 02:09 archibate

saunlesuanle

lin0ww0nil avatar Sep 18 '23 02:09 lin0ww0nil