heir icon indicating copy to clipboard operation
heir copied to clipboard

Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results.

Open j2kun opened this issue 1 year ago • 14 comments

@AlexanderViand-Intel I may need your help here. I haven't figured out how to build HECO yet (everything succeeds in your repo instructions up to the linking step, then it fails with a bazillion linker errors to LLVM).

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

j2kun avatar Mar 27 '24 21:03 j2kun

Last one appears to be upgrading the roberts cross 4x4 to 64x64 (though we know this will be very slow due to https://github.com/google/heir/issues/589). Maybe in the mean time we can limit it to 32x32.

j2kun avatar Apr 02 '24 21:04 j2kun

I'm trying to get up to speed after RWC/Easter Holidays, and I saw that you mentioned compile times in the order of 50 minutes in #589, which sounds an order of magnitude worse than what HECO gets. In HECO, I did a few tricks to speed things up, such as moving one or two patterns out of normal canonicalization into their own pass, maybe we can do something similar here.

EDIT: It seems like the PR (#587) mentioned in that issue is somehow borked - there's no commits?

@AlexanderViand-Intel I may need your help here. I haven't figured out how to build HECO yet (everything succeeds in your repo instructions up to the linking step, then it fails with a bazillion linker errors to LLVM).

Mh, that's odd - the repo, and the artifact tag specifically, should "just" work, having been through the USENIX artifact evaluation process. What linker did you use? lld or something else?

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

Sure, let me run those for you and figure out how best to post them.

AlexanderViand-Intel avatar Apr 04 '24 11:04 AlexanderViand-Intel

That is bizarre about copybara. Here's the commit: https://github.com/google/heir/commit/4d84f04d7d82fa8507e26ef8968b675772def48a

j2kun avatar Apr 04 '24 13:04 j2kun

Yeah I was thinking I could extract the needed patterns (converting the cleartext weight tensor to individual constants would suffice), but haven't had the chance to figure out exactly which pattern does that in canonicalize.

j2kun avatar Apr 04 '24 13:04 j2kun

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

Sure, let me run those for you and figure out how best to post them.

Here's that HECO output, I figured adding it to the repo as another tag would be easier than creating a bunch of gists: HECO@artifact_mlir_output

AlexanderViand-Intel avatar Apr 08 '24 14:04 AlexanderViand-Intel

Summarizing:

boxblur_64x64.mlir dotproduct_8.mlir gxkernel_64x64.mlir hammingdistance_4.mlir linearpolynomial_64.mlir quadraticpolynomial_64.mlir robertscross_64x64.mlir
HEIR 7 3 6 (8x8, not 64x64) 3 0 0 2 (4x4)
HECO 3 3 5 3 0 0 4

So remaining work is:

  • Improve box_blur
  • upgrade gx_kernel to 64x64 from 8x8 and roberts_cross from 4x4 to 64x64 (https://github.com/google/heir/pull/604)
  • Scrutinize roberts_cross to ensure that HEIR's "better" program is actually correct.

j2kun avatar Apr 08 '24 18:04 j2kun

  • Scrutinize roberts_cross to ensure that HEIR's "better" program is actually correct.

👀 I think HECO generates the same program as porcupine here (which should be optimal, so I'd be surprised to see something better) but let me check whether this is actually the case.

EDIT: I looked it up and Porcupine's solution is better, but it still needs 3 rotations, not 2.

AlexanderViand-Intel avatar Apr 09 '24 08:04 AlexanderViand-Intel

After discussion, we realized the reason HEIR seems to do worse on the box_blur example is because HECO's kernel is 2x2, while HEIR's is 3x3, so HEIR's should be equivalent to HECO on that example. robertscross_64x64 is the remaining example to inspect.

j2kun avatar Apr 18 '24 13:04 j2kun

Now that the first e2e runtime test is done (simple_sum_test.cpp), I'm looking at adding analogous tests for the rest of the HECO examples to bolster our confidence in their correctness.

@AlexanderViand-Intel: do you happen to have any input-output test pairs lying around from HECO? Just to save some time manually generating them.

j2kun avatar Apr 26 '24 17:04 j2kun

Roberts cross had a bug in the input MLIR that is fixed in https://github.com/google/heir/pull/676

j2kun avatar May 08 '24 18:05 j2kun

@AlexanderViand-Intel: do you happen to have any input-output test pairs lying around from HECO? Just to save some time manually generating them.

Sorry, everything from HECO would be SEAL based (e.g., Porcupine-based SEAL implementation), so I think what you ended up doing is the most effective way forward.

AlexanderViand-Intel avatar May 13 '24 06:05 AlexanderViand-Intel

So remaining work is:

  • Improve box_blur

I think this is the last open item in this issue, right? According to the box_blur_64x64.mlir test, HEIR still generates a solution with 7 rotations, some of which are rotations not of the input image, but of intermediate results.

AlexanderViand-Intel avatar May 13 '24 06:05 AlexanderViand-Intel

So remaining work is:

  • Improve box_blur

I think this is the last open item in this issue, right? According to the box_blur_64x64.mlir test, HEIR still generates a solution with 7 rotations, some of which are rotations not of the input image, but of intermediate results.

From https://github.com/google/heir/issues/571#issuecomment-2063841078, we agreed that it was 7 because the kernel is 3x3, whereas HECO did 2x2.

The gx_kernel still has one more rotation than HECO that I haven't explained, and it also doesn't have an end-to-end test yet.

j2kun avatar May 13 '24 16:05 j2kun

From #571 (comment), we agreed that it was 7 because the kernel is 3x3, whereas HECO did 2x2.

🤦 Sorry, I got confused by seeing the intermediate-value rotations, which is structurally different from what HECO generates. This is actually a good thing, though , as HEIR does in fact get 3 rotations in 2x2 kernel mode, but HECO for 3x3 requires (as I'd expect) 8 rotations (kernel_size - 1).

The gx_kernel still has one more rotation than HECO that I haven't explained, and it also doesn't have an end-to-end test yet.

I wouldn't be surprised if it turns out this is another bug in the HECO inputs 🙈 as it'd be odd for HEIR to successfully handle kernels of this style but not two such kernels in sequence.

AlexanderViand-Intel avatar May 13 '24 17:05 AlexanderViand-Intel

Closing and removing remaining work to https://github.com/google/heir/issues/758

j2kun avatar Jun 30 '24 12:06 j2kun