heir [poly]: Implement an end-to-end Dilithium signature scheme to exercise the polynomial dialect lowerings

As the title says.

We considered the Dilithium signature scheme and the Kyber KEM as initial "end-to-end" examples for polynomial and this lowering. While Dilithium shouldn't be an issue, Kyber's ring doesn't support full-depth NTT, so the lowering would have to be aware of this (See https://electricdusk.com/ntt.html for a nice explanation of the issue).

Nov 07 '23 14:11 AlexanderViand-Intel

Let's make this issue a bit more specific for anyone reading this issue who isn't familiar with the details.

Cf. the Dilithium spec, section 1.2 "Implementation Considerations" (page 5) of https://pq-crystals.org/dilithium/data/dilithium-specification-round3-20210208.pdf

They use Z_q[X]/(X^256 + 1) and do a matrix-vector multiplication where the matrix has size 6x5. They use the "fully-splitting NTT algorithm" so that the polynomials can be multiplied componentwise after being transformed.

This ticket should be split into more manageable pieces. I think they might be:

Define appropriate IR constructs to identify polynomial representations in NTT/FFT/coefficient domains. I believe we discussed using the tensor encoding attribute to store something like "this tensor is an NTT-domain polynomial with parameters ..."
Write a pass that converts "naive" polynomial multiplications into combinations of NTT + element-wise mul + INTT
Write canonicalizations that collapse unnecessary NTT/INTT
Write an initial lowering for NTT/INTT using a standard algorithm
Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
Write optimized NTT/INTT lowerings for special hardware (e.g., AVX)

If you agree I can go ahead and file those issues, and this ticket can be switched perhaps to adding the end-to-end test of a Dilithium signature.

Mar 20 '24 14:03 j2kun

Let me also add two FHE-specific references that are likely to be useful (in addition to more general work on fast NTT/modular arithmetic implementation):

Intel HEXL: Accelerating Homomorphic Encryption with Intel AVX512-IFMA52 which gives some details on how to exploit AVX512 for HE
HEaaN.MLIR: An Optimizing Compiler for Fast Ring-Based Homomorphic Encryption which has done exactly this work of implementing fast FHE-friendly polynomial ring arithmetic in MLIR, which is unfortunately not open-source but might still be a good reference.

Mar 20 '24 14:03 AlexanderViand-Intel

This ticket should be split into more manageable pieces. I think they might be [...] If you agree I can go ahead and file those issues, and this ticket can be switched perhaps to adding the end-to-end test of a Dilithium signature.

Yes, I think that'd be a good idea! I'm guessing you'll create an umbrella issue to track all those issues?

Mar 20 '24 14:03 AlexanderViand-Intel

This ticket should be split into more manageable pieces. I think they might be:

Define appropriate IR constructs to identify polynomial representations in NTT/FFT/coefficient domains. I believe we discussed using the tensor encoding attribute to store something like "this tensor is an NTT-domain polynomial with parameters ..."

Write a pass that converts "naive" polynomial multiplications into combinations of NTT + element-wise mul + INTT

Write canonicalizations that collapse unnecessary NTT/INTT

Write an initial lowering for NTT/INTT using a standard algorithm

Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)

Write optimized NTT/INTT lowerings for special hardware (e.g., AVX)

Since this was written, the following are completed (though may not be fully polished)

[ ] (unclear?) Define appropriate IR constructs to identify polynomial representations in NTT/FFT/coefficient domains. I believe we discussed using the tensor encoding attribute to store something like "this tensor is an NTT-domain polynomial with parameters ..."
[x] Write a pass that converts "naive" polynomial multiplications into combinations of NTT + element-wise mul + INTT
[x] Write canonicalizations that collapse unnecessary NTT/INTT
[x] Write an initial lowering for NTT/INTT using a standard algorithm
[x] Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
[ ] Write optimized NTT/INTT lowerings for special hardware (e.g., AVX) https://github.com/google/heir/issues/544

Jun 30 '24 12:06 j2kun

Since this was written, the following are completed (though may not be fully polished)

[x] Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)

I'm not sure I actually implement lowerings for linalg.generic (the -convert-elementwise-to-affine -polynomial-to-standard llowering chain uses affine.for instead), but if there's interest by someone to pick up this PR, I'm more than happy to add support for linalg.generic, too :)

Jul 03 '24 15:07 AlexanderViand-Intel