[poly]: Implement an end-to-end Dilithium signature scheme to exercise the polynomial dialect lowerings
As the title says.
We considered the Dilithium signature scheme and the Kyber KEM as initial "end-to-end" examples for polynomial and this lowering. While Dilithium shouldn't be an issue, Kyber's ring doesn't support full-depth NTT, so the lowering would have to be aware of this (See https://electricdusk.com/ntt.html for a nice explanation of the issue).
Let's make this issue a bit more specific for anyone reading this issue who isn't familiar with the details.
Cf. the Dilithium spec, section 1.2 "Implementation Considerations" (page 5) of https://pq-crystals.org/dilithium/data/dilithium-specification-round3-20210208.pdf
They use Z_q[X]/(X^256 + 1) and do a matrix-vector multiplication where the matrix has size 6x5. They use the "fully-splitting NTT algorithm" so that the polynomials can be multiplied componentwise after being transformed.
This ticket should be split into more manageable pieces. I think they might be:
- Define appropriate IR constructs to identify polynomial representations in NTT/FFT/coefficient domains. I believe we discussed using the tensor encoding attribute to store something like "this tensor is an NTT-domain polynomial with parameters ..."
- Write a pass that converts "naive" polynomial multiplications into combinations of NTT + element-wise mul + INTT
- Write canonicalizations that collapse unnecessary NTT/INTT
- Write an initial lowering for NTT/INTT using a standard algorithm
- Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
- Write optimized NTT/INTT lowerings for special hardware (e.g., AVX)
If you agree I can go ahead and file those issues, and this ticket can be switched perhaps to adding the end-to-end test of a Dilithium signature.
Let me also add two FHE-specific references that are likely to be useful (in addition to more general work on fast NTT/modular arithmetic implementation):
-
Intel HEXL: Accelerating Homomorphic Encryption with Intel AVX512-IFMA52 which gives some details on how to exploit AVX512 for HE
-
HEaaN.MLIR: An Optimizing Compiler for Fast Ring-Based Homomorphic Encryption which has done exactly this work of implementing fast FHE-friendly polynomial ring arithmetic in MLIR, which is unfortunately not open-source but might still be a good reference.
This ticket should be split into more manageable pieces. I think they might be [...] If you agree I can go ahead and file those issues, and this ticket can be switched perhaps to adding the end-to-end test of a Dilithium signature.
Yes, I think that'd be a good idea! I'm guessing you'll create an umbrella issue to track all those issues?
This ticket should be split into more manageable pieces. I think they might be:
- Define appropriate IR constructs to identify polynomial representations in NTT/FFT/coefficient domains. I believe we discussed using the tensor encoding attribute to store something like "this tensor is an NTT-domain polynomial with parameters ..."
- Write a pass that converts "naive" polynomial multiplications into combinations of NTT + element-wise mul + INTT
- Write canonicalizations that collapse unnecessary NTT/INTT
- Write an initial lowering for NTT/INTT using a standard algorithm
- Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
- Write optimized NTT/INTT lowerings for special hardware (e.g., AVX)
Since this was written, the following are completed (though may not be fully polished)
- [ ] (unclear?) Define appropriate IR constructs to identify polynomial representations in NTT/FFT/coefficient domains. I believe we discussed using the tensor encoding attribute to store something like "this tensor is an NTT-domain polynomial with parameters ..."
- [x] Write a pass that converts "naive" polynomial multiplications into combinations of NTT + element-wise mul + INTT
- [x] Write canonicalizations that collapse unnecessary NTT/INTT
- [x] Write an initial lowering for NTT/INTT using a standard algorithm
- [x] Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
- [ ] Write optimized NTT/INTT lowerings for special hardware (e.g., AVX) https://github.com/google/heir/issues/544
Since this was written, the following are completed (though may not be fully polished)
- [x] Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
I'm not sure I actually implement lowerings for linalg.generic (the -convert-elementwise-to-affine -polynomial-to-standard llowering chain uses affine.for instead), but if there's interest by someone to pick up this PR, I'm more than happy to add support for linalg.generic, too :)