jammychiou1
jammychiou1
- The floor() in floor((f + 127) >> 7) was somewhat unecessary as the usual semantic for the right-shift operator (>>) has integer output anyway. Seeing as the right-shift operator...
AVX2 doesn't have rounding right-shift (e.g. URSHR in Neon). Instead, it was simulated using "mulhrs with a power of 2" in decompose ([here](https://github.com/pq-code-package/mldsa-native/blob/45e64ca9d22a2121757b2044820f517539b2cf9c/dev/x86_64/src/poly_decompose_32_avx2.c#L56) for example). While this only need one...
Decompose() naturally needs round-half-down. It can be computed with rounding mulhi which always round half up, but our current explanation ([here](https://github.com/pq-code-package/mldsa-native/blob/45e64ca9d22a2121757b2044820f517539b2cf9c/dev/aarch64_clean/src/poly_decompose_32_asm.S#L12-L15) for example) doesn't justify this very clearly. See https://github.com/pq-code-package/mldsa-native/pull/411#discussion_r2371057773...