Avoid calling keccak_absorb with partial lanes
On 32-bit architectures, each call to mld_keccakf1600_xor_bytes incurs an overhead. For example, on Arm v7-M and Arm v8-M and using the optimised bit interleave from xkcp xoring a lane into the state incurs an overhead of 37 instructions. Any time an incomplete lane is xored into the state, this penalty is paid twice. This PR ensures that only full lanes are xored into the state.
Fixes #445
Please provide a description for this PR. What is the point of this refactoring? What benefit does it bring? Please provide CBMC proof harness and Makefile for any new functions.
@bremoran, sorry for the long wait for the review on this. Could you please rebase this on top of the changes in main, so we can benchmark and review it?
@bremoran, that was not quite what I meant by rebasing. I applied the changes required to make this work myself in https://github.com/pq-code-package/mldsa-native/pull/450/commits/a8d2d6a8b0efde1e37923bb4d8373a1645d12b6c.
Performance-wise, there is no reason to not merge this. There is even a small improvement on Cortex-A55 of 1-3% and (for reasons that are beyond me) on 4th gen AMD EPYC (c7a).
CBMC proofs are failing, but we can fix that at a later point.
Fundamentally, I believe such caching does not belong in sign.c, but should be done in fips202.c. One could make the incomplete lane part of the Keccak state which would make it a little bit cleaner, but it would still clutter the code somewhat.
WDYT @hanno-becker?
I see one proof failure in mld_H. Let me take a look...
Thanks @bremoran! I can definitely see this being useful for 32-bit platforms.
A few requests:
- I don't think this needs an API extension: Instead, the buffering of state prior to XOR'ing should be an implementation detail (add a buffer for the incomplete lane) of the existing absorb API.
- We should have documentation and CBMC proofs for new functionality.
- The new logic belongs to FIPS-202.
Could you adjust the PR accordingly?
I agree. Marking this as draft for now. @bremoran, please mark it as ready when you have updated the PR. Let us know if you need help with adjusting the CBMC proofs.