Test out mulhrs vs add+shift in decompose

Open jammychiou1 opened this issue 2 months ago • 0 comments

AVX2 doesn't have rounding right-shift (e.g. URSHR in Neon). Instead, it was simulated using "mulhrs with a power of 2" in decompose (here for example). While this only need one instruction, it's likely that add+shift is still faster.

See https://github.com/pq-code-package/mldsa-native/pull/629#discussion_r2508945174 for more context.

Nov 10 '25 08:11 jammychiou1