heir Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results.

@AlexanderViand-Intel I may need your help here. I haven't figured out how to build HECO yet (everything succeeds in your repo instructions up to the linking step, then it fails with a bazillion linker errors to LLVM).

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

Mar 27 '24 21:03 j2kun

Last one appears to be upgrading the roberts cross 4x4 to 64x64 (though we know this will be very slow due to https://github.com/google/heir/issues/589). Maybe in the mean time we can limit it to 32x32.

Apr 02 '24 21:04 j2kun

I'm trying to get up to speed after RWC/Easter Holidays, and I saw that you mentioned compile times in the order of 50 minutes in #589, which sounds an order of magnitude worse than what HECO gets. In HECO, I did a few tricks to speed things up, such as moving one or two patterns out of normal canonicalization into their own pass, maybe we can do something similar here.

EDIT: It seems like the PR (#587) mentioned in that issue is somehow borked - there's no commits?

@AlexanderViand-Intel I may need your help here. I haven't figured out how to build HECO yet (everything succeeds in your repo instructions up to the linking step, then it fails with a bazillion linker errors to LLVM).

Mh, that's odd - the repo, and the artifact tag specifically, should "just" work, having been through the USENIX artifact evaluation process. What linker did you use? lld or something else?

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

Sure, let me run those for you and figure out how best to post them.

Apr 04 '24 11:04 AlexanderViand-Intel

That is bizarre about copybara. Here's the commit: https://github.com/google/heir/commit/4d84f04d7d82fa8507e26ef8968b675772def48a

Apr 04 '24 13:04 j2kun

Yeah I was thinking I could extract the needed patterns (converting the cleartext weight tensor to individual constants would suffice), but haven't had the chance to figure out exactly which pattern does that in canonicalize.

Apr 04 '24 13:04 j2kun

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

Sure, let me run those for you and figure out how best to post them.

Here's that HECO output, I figured adding it to the repo as another tag would be easier than creating a bunch of gists: HECO@artifact_mlir_output

Apr 08 '24 14:04 AlexanderViand-Intel

Summarizing:

	boxblur_64x64.mlir	dotproduct_8.mlir	gxkernel_64x64.mlir	hammingdistance_4.mlir	linearpolynomial_64.mlir	quadraticpolynomial_64.mlir	robertscross_64x64.mlir
HEIR	7	3	6 (8x8, not 64x64)	3	0	0	2 (4x4)
HECO	3	3	5	3	0	0	4

So remaining work is:

Improve box_blur
upgrade gx_kernel to 64x64 from 8x8 and roberts_cross from 4x4 to 64x64 (https://github.com/google/heir/pull/604)
Scrutinize roberts_cross to ensure that HEIR's "better" program is actually correct.

Apr 08 '24 18:04 j2kun

Scrutinize roberts_cross to ensure that HEIR's "better" program is actually correct.

👀 I think HECO generates the same program as porcupine here (which should be optimal, so I'd be surprised to see something better) but let me check whether this is actually the case.

EDIT: I looked it up and Porcupine's solution is better, but it still needs 3 rotations, not 2.

Apr 09 '24 08:04 AlexanderViand-Intel

After discussion, we realized the reason HEIR seems to do worse on the box_blur example is because HECO's kernel is 2x2, while HEIR's is 3x3, so HEIR's should be equivalent to HECO on that example. robertscross_64x64 is the remaining example to inspect.

Apr 18 '24 13:04 j2kun

Now that the first e2e runtime test is done (simple_sum_test.cpp), I'm looking at adding analogous tests for the rest of the HECO examples to bolster our confidence in their correctness.

@AlexanderViand-Intel: do you happen to have any input-output test pairs lying around from HECO? Just to save some time manually generating them.

Apr 26 '24 17:04 j2kun

Roberts cross had a bug in the input MLIR that is fixed in https://github.com/google/heir/pull/676

May 08 '24 18:05 j2kun

@AlexanderViand-Intel: do you happen to have any input-output test pairs lying around from HECO? Just to save some time manually generating them.

Sorry, everything from HECO would be SEAL based (e.g., Porcupine-based SEAL implementation), so I think what you ended up doing is the most effective way forward.

May 13 '24 06:05 AlexanderViand-Intel

So remaining work is:

Improve box_blur

I think this is the last open item in this issue, right? According to the box_blur_64x64.mlir test, HEIR still generates a solution with 7 rotations, some of which are rotations not of the input image, but of intermediate results.

May 13 '24 06:05 AlexanderViand-Intel

So remaining work is:

Improve box_blur

I think this is the last open item in this issue, right? According to the box_blur_64x64.mlir test, HEIR still generates a solution with 7 rotations, some of which are rotations not of the input image, but of intermediate results.

From https://github.com/google/heir/issues/571#issuecomment-2063841078, we agreed that it was 7 because the kernel is 3x3, whereas HECO did 2x2.

The gx_kernel still has one more rotation than HECO that I haven't explained, and it also doesn't have an end-to-end test yet.

May 13 '24 16:05 j2kun

From #571 (comment), we agreed that it was 7 because the kernel is 3x3, whereas HECO did 2x2.

🤦 Sorry, I got confused by seeing the intermediate-value rotations, which is structurally different from what HECO generates. This is actually a good thing, though , as HEIR does in fact get 3 rotations in 2x2 kernel mode, but HECO for 3x3 requires (as I'd expect) 8 rotations (kernel_size - 1).

The gx_kernel still has one more rotation than HECO that I haven't explained, and it also doesn't have an end-to-end test yet.

I wouldn't be surprised if it turns out this is another bug in the HECO inputs 🙈 as it'd be odd for HEIR to successfully handle kernels of this style but not two such kernels in sequence.

May 13 '24 17:05 AlexanderViand-Intel

Closing and removing remaining work to https://github.com/google/heir/issues/758

Jun 30 '24 12:06 j2kun