Halide icon indicating copy to clipboard operation
Halide copied to clipboard

Improved find_constant_bound(s)

Open rootjalex opened this issue 3 years ago • 24 comments

This PR provides a series of methods for removing/simplifying correlated expressions for find_constant_bounds:

  • Bounded let-substitutions (~~n=100~~ edit: n=16). We don't want to always substitute all lets, but some constant bounds can be calculated just by a small number of substitutions.
  • Removing unbounded terms from mins/maxs. A (simplistic) example is below:
Find lower bound on:
max(x, y) - (z + y)
With z : [0, 8]

This method would note that x is unbounded, and therefore the lhs of the max can be stripped, producing:

y - (z - y) -> 0 - z -> lower bounded by -8
  • Affine term reordering. Halide’s TRS-based simplification can only cancel terms in sums up to a certain depth, this method uses a linear-time algorithm for canceling like-terms.
  • Pushing rationals inwards. This technique pushes multiplications inwards to allow stronger simplification. More importantly, it pushes divisions inwards via a safe approximation, most encapsulate by the following equations:
// Addition:
(a / n) + (b / n) <= (a + b) / n <= (a / n) + (b / n) + 1
// Subtraction:
(a / n) - (b / n) - 1 <= (a - b) / n <= (a / n) - (b / n)

This allows us to push divisions inside additions/subtractions which can improve the ability to cancel like terms in a lot of generated equations.

@abadams ran a series of experiments with randomly-generated schedules (n=256) on a series of apps (bgu, camera_pipe, conv_layer, depthwise_separable_conv, harris, hist, iir_blur, lens_blur, max_filter, stencil_chain, unsharp), and here is a summary of the results (percentages are total across the benchmarks):

Less failed unrolls: bgu (5 -> 3), camera_pipe (69 -> 58), harris (197 -> 160), lens_blur (158 -> 12), max_filter (338 -> 242), unsharp (110 -> 32) Less memory: camera_pipe (0.6%), depthwise_separable_conv (0.3%), hist (0.1%), lens_blur (0.6%), unsharp (0.2%) Less malloc calls: camera_pipe (173592 -> 171612), harris (608943 -> 608655), lens_blur (144324 -> 141888), stencil_chain (701085 -> 698712), unsharp (127428 -> 127284) Some small runtime improvements (0.05% to 0.6%) : bgu, camera_pipe, harris, hist, iir_blur, lens_blur, max_filter, stencil_chain

More memory: harris (0.06%), iir_blur (0.002%), stencil_chain (0.1%)

The runtime improvements might not be statistically significant, but I think better loop unrolling and improved stack allocations are important contributions.

For apps with no improved unrolling, compilation times increase by a small amount (~3%). With improved unrolling, there are large increases but are mostly due to the fact that generating the unrolled code takes longer in both our codegen and LLVM codegen.

This work was part of a project with @abadams and @shoaibkamil .

rootjalex avatar May 31 '22 17:05 rootjalex

@abadams -- should I pull this into Google and do some torture testing before landing, or are we pretty confident this is good?

steven-johnson avatar Jun 01 '22 19:06 steven-johnson

Torture testing inside Google would be pretty helpful, thanks.

abadams avatar Jun 01 '22 20:06 abadams

Testing in Google, I find only one new failure, but... it appears to be a hang (or near-infinite loop) inside Bounding small realizations... when compiling one specific Generator. Adding to the fun, it's in some proprietary stuff that might be hard to share publicly. Let me see if I can narrow things down further...

steven-johnson avatar Jun 02 '22 18:06 steven-johnson

Yeah, we definitely get stuck ~forever in bound_small_allocations(), which was only changed to include the new header, so something about the change in definition has injected something here. Let me see if I can come up with a repro case I can share.

steven-johnson avatar Jun 02 '22 18:06 steven-johnson

bound_small_allocations() is calling the new version(s) of find_constant_bound(s), which means that there is likely an allocation expression that is tripping up the new method - if you could log which expression (and the corresponding scope) is causing the hang, I can investigate (hopefully sharing that much is okay?).

rootjalex avatar Jun 02 '22 18:06 rootjalex

So far, what I'm finding is that we have a fairly complex Expr that is the input to remove_unbounded_terms:

(((let t418 = min(max(min(max(min(max(min(max(min(max(min(max(min(max(min((foo$13.extent.0 + foo$13.min.0) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0) in (max(min(max(min(max(min(max(min(max(min(max(min(max(t418, 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), max(min(max(min(max(min(max(min(max(max(min(max(t418, 1) + 4, input.extent.0), t418), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 1) + 3, input.extent.0), 1) + -1)) - (let t407 = min(max(min(min(min(max(foo$13.min.0, -1), min(max(min(input.extent.0 + -1, foo$13.min.0), 0), max(min(foo$13.min.0 + 2, input.extent.0), 1) + -1)) + 2, input.extent.0) + -1, foo$13.min.0), 0), max(min(min(max(min(min(max(foo$13.min.0, -1) + 2, input.extent.0) + -1, foo$13.min.0), 0), max(min(min(max(min(input.extent.0 + -1, foo$13.min.0), 0), max(min(foo$13.min.0 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in (let t411 = min(max(min(min(min(min(max(min(input.extent.0 + -1, t407), 0), max(min(t407 + 2, input.extent.0), 1) + -1), t407) + 2, input.extent.0) + -1, t407), 0), max(min(min(max(min(min(t407 + 2, input.extent.0) + -1, t407), 0), max(min(min(max(min(input.extent.0 + -1, t407), 0), max(min(t407 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in (let t423 = min(max(min(min(min(min(max(min(input.extent.0 + -1, t411), 0), max(min(t411 + 2, input.extent.0), 1) + -1), t411) + 2, input.extent.0) + -1, t411), 0), max(min(min(max(min(min(t411 + 2, input.extent.0) + -1, t411), 0), max(min(min(max(min(input.extent.0 + -1, t411), 0), max(min(t411 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in (let t427 = min(max(min(min(min(min(max(min(input.extent.0 + -1, t423), 0), max(min(t423 + 2, input.extent.0), 1) + -1), t423) + 2, input.extent.0) + -1, t423), 0), max(min(min(max(min(min(t423 + 2, input.extent.0) + -1, t423), 0), max(min(min(max(min(input.extent.0 + -1, t423), 0), max(min(t423 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in min(max(min(min(t427 + 2, input.extent.0) + -1, t427), 0), max(min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(max(min(t427 + 2, input.extent.0), 1), min(max(min(input.extent.0 + -1, t427), 0), min(max(min(t427 + 2, input.extent.0), 1), min(max(min(input.extent.0 + -1, t427), 0), min(max(min(t427 + 2, input.extent.0), 1), max(min(input.extent.0 + -1, t427), 0) + 1)) + 1)) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1)))))) + 1)

which becomes insanely huge afterwards (too large to bother pasting here -- something like 7MB of text when the Expr is printed), and that's after the call to simplify().

EDIT: the corresponding scope at that point:

 scope:
{
  output$1.s0.x.max
  output$1.s0.x.min
  output$1.s0.y.max
  output$1.s0.y.min
  output$1.s1.r8$x.max
  output$1.s1.r8$x.min
  output$1.s1.x.max
  output$1.s1.x.min
  output$1.s1.y.max
  output$1.s1.y.min
  output$10.s0.x.max
  output$10.s0.x.min
  output$10.s0.y.max
  output$10.s0.y.min
  output$10.s1.r125$x.max
  output$10.s1.r125$x.min
  output$10.s1.x.max
  output$10.s1.x.min
  output$10.s1.y.max
  output$10.s1.y.min
  output$11.s0.x.max
  output$11.s0.x.min
  output$11.s0.y.max
  output$11.s0.y.min
  output$11.s1.r138$x.max
  output$11.s1.r138$x.min
  output$11.s1.x.max
  output$11.s1.x.min
  output$11.s1.y.max
  output$11.s1.y.min
  output$12.s0.x.max
  output$12.s0.x.min
  output$12.s0.y.max
  output$12.s0.y.min
  output$12.s1.r151$x.max
  output$12.s1.r151$x.min
  output$12.s1.x.max
  output$12.s1.x.min
  output$12.s1.y.max
  output$12.s1.y.min
  output$13.s0.x.max
  output$13.s0.x.min
  output$13.s0.y.max
  output$13.s0.y.min
  output$13.s1.r164$x.max
  output$13.s1.r164$x.min
  output$13.s1.x.max
  output$13.s1.x.min
  output$13.s1.y.max
  output$13.s1.y.min
  output$14.s0.x.max
  output$14.s0.x.min
  output$14.s0.y.max
  output$14.s0.y.min
  output$14.s1.r177$x.max
  output$14.s1.r177$x.min
  output$14.s1.x.max
  output$14.s1.x.min
  output$14.s1.y.max
  output$14.s1.y.min
  output$2.s0.x.max
  output$2.s0.x.min
  output$2.s0.y.max
  output$2.s0.y.min
  output$2.s1.r21$x.max
  output$2.s1.r21$x.min
  output$2.s1.x.max
  output$2.s1.x.min
  output$2.s1.y.max
  output$2.s1.y.min
  output$3.s0.x.max
  output$3.s0.x.min
  output$3.s0.y.max
  output$3.s0.y.min
  output$3.s1.r34$x.max
  output$3.s1.r34$x.min
  output$3.s1.x.max
  output$3.s1.x.min
  output$3.s1.y.max
  output$3.s1.y.min
  output$4.s0.x.max
  output$4.s0.x.min
  output$4.s0.y.max
  output$4.s0.y.min
  output$4.s1.r47$x.max
  output$4.s1.r47$x.min
  output$4.s1.x.max
  output$4.s1.x.min
  output$4.s1.y.max
  output$4.s1.y.min
  output$5.s0.x.max
  output$5.s0.x.min
  output$5.s0.y.max
  output$5.s0.y.min
  output$5.s1.r60$x.max
  output$5.s1.r60$x.min
  output$5.s1.x.max
  output$5.s1.x.min
  output$5.s1.y.max
  output$5.s1.y.min
  output$6.s0.x.max
  output$6.s0.x.min
  output$6.s0.y.max
  output$6.s0.y.min
  output$6.s1.r73$x.max
  output$6.s1.r73$x.min
  output$6.s1.x.max
  output$6.s1.x.min
  output$6.s1.y.max
  output$6.s1.y.min
  output$7.s0.x.max
  output$7.s0.x.min
  output$7.s0.y.max
  output$7.s0.y.min
  output$7.s1.r86$x.max
  output$7.s1.r86$x.min
  output$7.s1.x.max
  output$7.s1.x.min
  output$7.s1.y.max
  output$7.s1.y.min
  output$8.s0.x.max
  output$8.s0.x.min
  output$8.s0.y.max
  output$8.s0.y.min
  output$8.s1.r99$x.max
  output$8.s1.r99$x.min
  output$8.s1.x.max
  output$8.s1.x.min
  output$8.s1.y.max
  output$8.s1.y.min
  output$9.s0.x.max
  output$9.s0.x.min
  output$9.s0.y.max
  output$9.s0.y.min
  output$9.s1.r112$x.max
  output$9.s1.r112$x.min
  output$9.s1.x.max
  output$9.s1.x.min
  output$9.s1.y.max
  output$9.s1.y.min
  foo$1.s0.x.max
  foo$1.s0.x.max.s
  foo$1.s0.x.min
  foo$1.s0.y.max
  foo$1.s0.y.max.s
  foo$1.s0.y.min
  foo$10.s0.x.max
  foo$10.s0.x.max.s
  foo$10.s0.x.min
  foo$10.s0.y.max
  foo$10.s0.y.max.s
  foo$10.s0.y.min
  foo$11.s0.x.max
  foo$11.s0.x.max.s
  foo$11.s0.x.min
  foo$11.s0.y.max
  foo$11.s0.y.max.s
  foo$11.s0.y.min
  foo$12.s0.x.max
  foo$12.s0.x.max.s
  foo$12.s0.x.min
  foo$12.s0.y.max
  foo$12.s0.y.max.s
  foo$12.s0.y.min
  foo$13.s0.x.max
  foo$13.s0.x.min
  foo$13.s0.y.max
  foo$13.s0.y.min
  foo$2.s0.x.max
  foo$2.s0.x.max.s
  foo$2.s0.x.min
  foo$2.s0.y.max
  foo$2.s0.y.max.s
  foo$2.s0.y.min
  foo$3.s0.x.max
  foo$3.s0.x.max.s
  foo$3.s0.x.min
  foo$3.s0.y.max
  foo$3.s0.y.max.s
  foo$3.s0.y.min
  foo$4.s0.x.max
  foo$4.s0.x.max.s
  foo$4.s0.x.min
  foo$4.s0.y.max
  foo$4.s0.y.max.s
  foo$4.s0.y.min
  foo$5.s0.x.max
  foo$5.s0.x.max.s
  foo$5.s0.x.min
  foo$5.s0.y.max
  foo$5.s0.y.max.s
  foo$5.s0.y.min
  foo$6.s0.x.max
  foo$6.s0.x.max.s
  foo$6.s0.x.min
  foo$6.s0.y.max
  foo$6.s0.y.max.s
  foo$6.s0.y.min
  foo$7.s0.x.max
  foo$7.s0.x.max.s
  foo$7.s0.x.min
  foo$7.s0.y.max
  foo$7.s0.y.max.s
  foo$7.s0.y.min
  foo$8.s0.x.max
  foo$8.s0.x.max.s
  foo$8.s0.x.min
  foo$8.s0.y.max
  foo$8.s0.y.max.s
  foo$8.s0.y.min
  foo$9.s0.x.max
  foo$9.s0.x.max.s
  foo$9.s0.x.min
  foo$9.s0.y.max
  foo$9.s0.y.max.s
  foo$9.s0.y.min
  foo.s0.x.max
  foo.s0.x.max.s
  foo.s0.x.min
  foo.s0.y.max
  foo.s0.y.max.s
  foo.s0.y.min
}

steven-johnson avatar Jun 02 '22 19:06 steven-johnson

Is it possible to know if the scope has any values actually set? Sorry, I didn't realize that printing scope only prints the names, I need the corresponding intervals as well.

rootjalex avatar Jun 02 '22 20:06 rootjalex

Definitely seems like the issue here is substitute_some_lets. Not sure exactly what the count should be, but 100 is too high

rootjalex avatar Jun 02 '22 20:06 rootjalex

Is it possible to know if the scope has any values actually set? Sorry, I didn't realize that printing scope only prints the names, I need the corresponding intervals as well.

{
  output$1.s0.x.max: 0, (void *)pos_inf
  output$1.s0.x.min: 0, (void *)pos_inf
  output$1.s0.y.max: 0, (void *)pos_inf
  output$1.s0.y.min: 0, (void *)pos_inf
  output$1.s1.r8$x.max: 3, 3
  output$1.s1.r8$x.min: 0, 0
  output$1.s1.x.max: 0, (void *)pos_inf
  output$1.s1.x.min: 0, (void *)pos_inf
  output$1.s1.y.max: 0, (void *)pos_inf
  output$1.s1.y.min: 0, (void *)pos_inf
  output$10.s0.x.max: 0, (void *)pos_inf
  output$10.s0.x.min: 0, (void *)pos_inf
  output$10.s0.y.max: 0, (void *)pos_inf
  output$10.s0.y.min: 0, (void *)pos_inf
  output$10.s1.r125$x.max: 3, 3
  output$10.s1.r125$x.min: 0, 0
  output$10.s1.x.max: 0, (void *)pos_inf
  output$10.s1.x.min: 0, (void *)pos_inf
  output$10.s1.y.max: 0, (void *)pos_inf
  output$10.s1.y.min: 0, (void *)pos_inf
  output$11.s0.x.max: 0, (void *)pos_inf
  output$11.s0.x.min: 0, (void *)pos_inf
  output$11.s0.y.max: 0, (void *)pos_inf
  output$11.s0.y.min: 0, (void *)pos_inf
  output$11.s1.r138$x.max: 3, 3
  output$11.s1.r138$x.min: 0, 0
  output$11.s1.x.max: 0, (void *)pos_inf
  output$11.s1.x.min: 0, (void *)pos_inf
  output$11.s1.y.max: 0, (void *)pos_inf
  output$11.s1.y.min: 0, (void *)pos_inf
  output$12.s0.x.max: 0, (void *)pos_inf
  output$12.s0.x.min: 0, (void *)pos_inf
  output$12.s0.y.max: 0, (void *)pos_inf
  output$12.s0.y.min: 0, (void *)pos_inf
  output$12.s1.r151$x.max: 3, 3
  output$12.s1.r151$x.min: 0, 0
  output$12.s1.x.max: 0, (void *)pos_inf
  output$12.s1.x.min: 0, (void *)pos_inf
  output$12.s1.y.max: 0, (void *)pos_inf
  output$12.s1.y.min: 0, (void *)pos_inf
  output$13.s0.x.max: 0, (void *)pos_inf
  output$13.s0.x.min: 0, (void *)pos_inf
  output$13.s0.y.max: 0, (void *)pos_inf
  output$13.s0.y.min: 0, (void *)pos_inf
  output$13.s1.r164$x.max: 3, 3
  output$13.s1.r164$x.min: 0, 0
  output$13.s1.x.max: 0, (void *)pos_inf
  output$13.s1.x.min: 0, (void *)pos_inf
  output$13.s1.y.max: 0, (void *)pos_inf
  output$13.s1.y.min: 0, (void *)pos_inf
  output$14.s0.x.max: (void *)neg_inf, (void *)pos_inf
  output$14.s0.x.min: (void *)neg_inf, (void *)pos_inf
  output$14.s0.y.max: (void *)neg_inf, (void *)pos_inf
  output$14.s0.y.min: (void *)neg_inf, (void *)pos_inf
  output$14.s1.r177$x.max: 3, 3
  output$14.s1.r177$x.min: 0, 0
  output$14.s1.x.max: (void *)neg_inf, (void *)pos_inf
  output$14.s1.x.min: (void *)neg_inf, (void *)pos_inf
  output$14.s1.y.max: (void *)neg_inf, (void *)pos_inf
  output$14.s1.y.min: (void *)neg_inf, (void *)pos_inf
  output$2.s0.x.max: 0, (void *)pos_inf
  output$2.s0.x.min: 0, (void *)pos_inf
  output$2.s0.y.max: 0, (void *)pos_inf
  output$2.s0.y.min: 0, (void *)pos_inf
  output$2.s1.r21$x.max: 3, 3
  output$2.s1.r21$x.min: 0, 0
  output$2.s1.x.max: 0, (void *)pos_inf
  output$2.s1.x.min: 0, (void *)pos_inf
  output$2.s1.y.max: 0, (void *)pos_inf
  output$2.s1.y.min: 0, (void *)pos_inf
  output$3.s0.x.max: 0, (void *)pos_inf
  output$3.s0.x.min: 0, (void *)pos_inf
  output$3.s0.y.max: 0, (void *)pos_inf
  output$3.s0.y.min: 0, (void *)pos_inf
  output$3.s1.r34$x.max: 3, 3
  output$3.s1.r34$x.min: 0, 0
  output$3.s1.x.max: 0, (void *)pos_inf
  output$3.s1.x.min: 0, (void *)pos_inf
  output$3.s1.y.max: 0, (void *)pos_inf
  output$3.s1.y.min: 0, (void *)pos_inf
  output$4.s0.x.max: 0, (void *)pos_inf
  output$4.s0.x.min: 0, (void *)pos_inf
  output$4.s0.y.max: 0, (void *)pos_inf
  output$4.s0.y.min: 0, (void *)pos_inf
  output$4.s1.r47$x.max: 3, 3
  output$4.s1.r47$x.min: 0, 0
  output$4.s1.x.max: 0, (void *)pos_inf
  output$4.s1.x.min: 0, (void *)pos_inf
  output$4.s1.y.max: 0, (void *)pos_inf
  output$4.s1.y.min: 0, (void *)pos_inf
  output$5.s0.x.max: 0, (void *)pos_inf
  output$5.s0.x.min: 0, (void *)pos_inf
  output$5.s0.y.max: 0, (void *)pos_inf
  output$5.s0.y.min: 0, (void *)pos_inf
  output$5.s1.r60$x.max: 3, 3
  output$5.s1.r60$x.min: 0, 0
  output$5.s1.x.max: 0, (void *)pos_inf
  output$5.s1.x.min: 0, (void *)pos_inf
  output$5.s1.y.max: 0, (void *)pos_inf
  output$5.s1.y.min: 0, (void *)pos_inf
  output$6.s0.x.max: 0, (void *)pos_inf
  output$6.s0.x.min: 0, (void *)pos_inf
  output$6.s0.y.max: 0, (void *)pos_inf
  output$6.s0.y.min: 0, (void *)pos_inf
  output$6.s1.r73$x.max: 3, 3
  output$6.s1.r73$x.min: 0, 0
  output$6.s1.x.max: 0, (void *)pos_inf
  output$6.s1.x.min: 0, (void *)pos_inf
  output$6.s1.y.max: 0, (void *)pos_inf
  output$6.s1.y.min: 0, (void *)pos_inf
  output$7.s0.x.max: 0, (void *)pos_inf
  output$7.s0.x.min: 0, (void *)pos_inf
  output$7.s0.y.max: 0, (void *)pos_inf
  output$7.s0.y.min: 0, (void *)pos_inf
  output$7.s1.r86$x.max: 3, 3
  output$7.s1.r86$x.min: 0, 0
  output$7.s1.x.max: 0, (void *)pos_inf
  output$7.s1.x.min: 0, (void *)pos_inf
  output$7.s1.y.max: 0, (void *)pos_inf
  output$7.s1.y.min: 0, (void *)pos_inf
  output$8.s0.x.max: 0, (void *)pos_inf
  output$8.s0.x.min: 0, (void *)pos_inf
  output$8.s0.y.max: 0, (void *)pos_inf
  output$8.s0.y.min: 0, (void *)pos_inf
  output$8.s1.r99$x.max: 3, 3
  output$8.s1.r99$x.min: 0, 0
  output$8.s1.x.max: 0, (void *)pos_inf
  output$8.s1.x.min: 0, (void *)pos_inf
  output$8.s1.y.max: 0, (void *)pos_inf
  output$8.s1.y.min: 0, (void *)pos_inf
  output$9.s0.x.max: 0, (void *)pos_inf
  output$9.s0.x.min: 0, (void *)pos_inf
  output$9.s0.y.max: 0, (void *)pos_inf
  output$9.s0.y.min: 0, (void *)pos_inf
  output$9.s1.r112$x.max: 3, 3
  output$9.s1.r112$x.min: 0, 0
  output$9.s1.x.max: 0, (void *)pos_inf
  output$9.s1.x.min: 0, (void *)pos_inf
  output$9.s1.y.max: 0, (void *)pos_inf
  output$9.s1.y.min: 0, (void *)pos_inf
  foo$1.s0.x.max: 0, (void *)pos_inf
  foo$1.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$1.s0.x.min: 0, (void *)pos_inf
  foo$1.s0.y.max: 0, (void *)pos_inf
  foo$1.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$1.s0.y.min: 0, (void *)pos_inf
  foo$10.s0.x.max: 0, (void *)pos_inf
  foo$10.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$10.s0.x.min: 0, (void *)pos_inf
  foo$10.s0.y.max: 0, (void *)pos_inf
  foo$10.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$10.s0.y.min: 0, (void *)pos_inf
  foo$11.s0.x.max: 0, (void *)pos_inf
  foo$11.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$11.s0.x.min: 0, (void *)pos_inf
  foo$11.s0.y.max: 0, (void *)pos_inf
  foo$11.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$11.s0.y.min: 0, (void *)pos_inf
  foo$12.s0.x.max: 0, (void *)pos_inf
  foo$12.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$12.s0.x.min: 0, (void *)pos_inf
  foo$12.s0.y.max: 0, (void *)pos_inf
  foo$12.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$12.s0.y.min: 0, (void *)pos_inf
  foo$13.s0.x.max: (void *)neg_inf, (void *)pos_inf
  foo$13.s0.x.min: (void *)neg_inf, (void *)pos_inf
  foo$13.s0.y.max: (void *)neg_inf, (void *)pos_inf
  foo$13.s0.y.min: (void *)neg_inf, (void *)pos_inf
  foo$2.s0.x.max: 0, (void *)pos_inf
  foo$2.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$2.s0.x.min: 0, (void *)pos_inf
  foo$2.s0.y.max: 0, (void *)pos_inf
  foo$2.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$2.s0.y.min: 0, (void *)pos_inf
  foo$3.s0.x.max: 0, (void *)pos_inf
  foo$3.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$3.s0.x.min: 0, (void *)pos_inf
  foo$3.s0.y.max: 0, (void *)pos_inf
  foo$3.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$3.s0.y.min: 0, (void *)pos_inf
  foo$4.s0.x.max: 0, (void *)pos_inf
  foo$4.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$4.s0.x.min: 0, (void *)pos_inf
  foo$4.s0.y.max: 0, (void *)pos_inf
  foo$4.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$4.s0.y.min: 0, (void *)pos_inf
  foo$5.s0.x.max: 0, (void *)pos_inf
  foo$5.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$5.s0.x.min: 0, (void *)pos_inf
  foo$5.s0.y.max: 0, (void *)pos_inf
  foo$5.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$5.s0.y.min: 0, (void *)pos_inf
  foo$6.s0.x.max: 0, (void *)pos_inf
  foo$6.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$6.s0.x.min: 0, (void *)pos_inf
  foo$6.s0.y.max: 0, (void *)pos_inf
  foo$6.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$6.s0.y.min: 0, (void *)pos_inf
  foo$7.s0.x.max: 0, (void *)pos_inf
  foo$7.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$7.s0.x.min: 0, (void *)pos_inf
  foo$7.s0.y.max: 0, (void *)pos_inf
  foo$7.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$7.s0.y.min: 0, (void *)pos_inf
  foo$8.s0.x.max: 0, (void *)pos_inf
  foo$8.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$8.s0.x.min: 0, (void *)pos_inf
  foo$8.s0.y.max: 0, (void *)pos_inf
  foo$8.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$8.s0.y.min: 0, (void *)pos_inf
  foo$9.s0.x.max: 0, (void *)pos_inf
  foo$9.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo$9.s0.x.min: 0, (void *)pos_inf
  foo$9.s0.y.max: 0, (void *)pos_inf
  foo$9.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo$9.s0.y.min: 0, (void *)pos_inf
  foo.s0.x.max: 0, (void *)pos_inf
  foo.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
  foo.s0.x.min: 0, (void *)pos_inf
  foo.s0.y.max: 0, (void *)pos_inf
  foo.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
  foo.s0.y.min: 0, (void *)pos_inf
}

steven-johnson avatar Jun 02 '22 20:06 steven-johnson

@steven-johnson Do you think you could run Google testing again? I think my tests just never had such enormous expressions, the example you provided should end reasonably fast now.

rootjalex avatar Jun 06 '22 22:06 rootjalex

@steven-johnson Do you think you could run Google testing again? I think my tests just never had such enormous expressions, the example you provided should end reasonably fast now.

Testing now, but hiding an apparently-critical constant (the count arg to substitute_some_lets) as a default-value argument seems suboptimal. If 16 is a good value for everything, make it internal to the function and name and comment on it. If it's not a good value for everything, don't give it a default value.

steven-johnson avatar Jun 06 '22 23:06 steven-johnson

(Tests look good so far, stand by)

steven-johnson avatar Jun 07 '22 00:06 steven-johnson

@steven-johnson Thanks for the feedback, I added documentation explaining the chosen value.

rootjalex avatar Jun 07 '22 03:06 rootjalex

I don't see any regressions in Google now, LGTM, land with approval

steven-johnson avatar Jun 07 '22 15:06 steven-johnson

Thanks @steven-johnson ! @abadams good to go?

rootjalex avatar Jun 07 '22 18:06 rootjalex

Is this ready to land (pending green)?

steven-johnson avatar Jun 27 '22 18:06 steven-johnson

No - I still need to address Andrew's point on using deep_equality, and still need feedback on the substiution.

Sorry for dropping the ball on this, I was out for a conference for a week and have been playing catch-up on other duties in the week since. Will try to make more progress on it this week.

rootjalex avatar Jun 27 '22 20:06 rootjalex

No worries, just trying to catch up on things after returning from my own vacation -- no rush on this from my perspective.

steven-johnson avatar Jun 27 '22 20:06 steven-johnson

Thanks! I hope it was a fun vacation!

rootjalex avatar Jun 27 '22 20:06 rootjalex

Are we hoping that this will allow us to remove the HL_PERMIT_FAILED_UNROLL hack?

steven-johnson avatar Jul 19 '22 23:07 steven-johnson

Hey, just a periodic status check on this one.

steven-johnson avatar Jul 27 '22 22:07 steven-johnson

Sorry - I'm getting a tad behind, and this PR has been on the back burner for a bit. I will try to get to it in the next few weeks.

rootjalex avatar Aug 01 '22 16:08 rootjalex

Monday Morning Review Ping -- where does this PR stand?

steven-johnson avatar Aug 22 '22 16:08 steven-johnson

It still has a bit of work to be done, and I have not managed to get to it yet. I haven't forgotten it, and will aim to address it by the end of September (I know that's far away and I apologize, but I am currently in paper-writing mode + am about to move across the country)

rootjalex avatar Aug 22 '22 17:08 rootjalex