Improved find_constant_bound(s)
This PR provides a series of methods for removing/simplifying correlated expressions for find_constant_bounds:
- Bounded let-substitutions (~~n=100~~ edit: n=16). We don't want to always substitute all lets, but some constant bounds can be calculated just by a small number of substitutions.
- Removing unbounded terms from mins/maxs. A (simplistic) example is below:
Find lower bound on:
max(x, y) - (z + y)
With z : [0, 8]
This method would note that x is unbounded, and therefore the lhs of the max can be stripped, producing:
y - (z - y) -> 0 - z -> lower bounded by -8
- Affine term reordering. Halide’s TRS-based simplification can only cancel terms in sums up to a certain depth, this method uses a linear-time algorithm for canceling like-terms.
- Pushing rationals inwards. This technique pushes multiplications inwards to allow stronger simplification. More importantly, it pushes divisions inwards via a safe approximation, most encapsulate by the following equations:
// Addition:
(a / n) + (b / n) <= (a + b) / n <= (a / n) + (b / n) + 1
// Subtraction:
(a / n) - (b / n) - 1 <= (a - b) / n <= (a / n) - (b / n)
This allows us to push divisions inside additions/subtractions which can improve the ability to cancel like terms in a lot of generated equations.
@abadams ran a series of experiments with randomly-generated schedules (n=256) on a series of apps (bgu, camera_pipe, conv_layer, depthwise_separable_conv, harris, hist, iir_blur, lens_blur, max_filter, stencil_chain, unsharp), and here is a summary of the results (percentages are total across the benchmarks):
Less failed unrolls: bgu (5 -> 3), camera_pipe (69 -> 58), harris (197 -> 160), lens_blur (158 -> 12), max_filter (338 -> 242), unsharp (110 -> 32) Less memory: camera_pipe (0.6%), depthwise_separable_conv (0.3%), hist (0.1%), lens_blur (0.6%), unsharp (0.2%) Less malloc calls: camera_pipe (173592 -> 171612), harris (608943 -> 608655), lens_blur (144324 -> 141888), stencil_chain (701085 -> 698712), unsharp (127428 -> 127284) Some small runtime improvements (0.05% to 0.6%) : bgu, camera_pipe, harris, hist, iir_blur, lens_blur, max_filter, stencil_chain
More memory: harris (0.06%), iir_blur (0.002%), stencil_chain (0.1%)
The runtime improvements might not be statistically significant, but I think better loop unrolling and improved stack allocations are important contributions.
For apps with no improved unrolling, compilation times increase by a small amount (~3%). With improved unrolling, there are large increases but are mostly due to the fact that generating the unrolled code takes longer in both our codegen and LLVM codegen.
This work was part of a project with @abadams and @shoaibkamil .
@abadams -- should I pull this into Google and do some torture testing before landing, or are we pretty confident this is good?
Torture testing inside Google would be pretty helpful, thanks.
Testing in Google, I find only one new failure, but... it appears to be a hang (or near-infinite loop) inside Bounding small realizations... when compiling one specific Generator. Adding to the fun, it's in some proprietary stuff that might be hard to share publicly. Let me see if I can narrow things down further...
Yeah, we definitely get stuck ~forever in bound_small_allocations(), which was only changed to include the new header, so something about the change in definition has injected something here. Let me see if I can come up with a repro case I can share.
bound_small_allocations() is calling the new version(s) of find_constant_bound(s), which means that there is likely an allocation expression that is tripping up the new method - if you could log which expression (and the corresponding scope) is causing the hang, I can investigate (hopefully sharing that much is okay?).
So far, what I'm finding is that we have a fairly complex Expr that is the input to remove_unbounded_terms:
(((let t418 = min(max(min(max(min(max(min(max(min(max(min(max(min(max(min((foo$13.extent.0 + foo$13.min.0) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0) in (max(min(max(min(max(min(max(min(max(min(max(min(max(t418, 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), max(min(max(min(max(min(max(min(max(max(min(max(t418, 1) + 4, input.extent.0), t418), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 4, input.extent.0), 1) + 1) + 3, input.extent.0), 1) + -1)) - (let t407 = min(max(min(min(min(max(foo$13.min.0, -1), min(max(min(input.extent.0 + -1, foo$13.min.0), 0), max(min(foo$13.min.0 + 2, input.extent.0), 1) + -1)) + 2, input.extent.0) + -1, foo$13.min.0), 0), max(min(min(max(min(min(max(foo$13.min.0, -1) + 2, input.extent.0) + -1, foo$13.min.0), 0), max(min(min(max(min(input.extent.0 + -1, foo$13.min.0), 0), max(min(foo$13.min.0 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in (let t411 = min(max(min(min(min(min(max(min(input.extent.0 + -1, t407), 0), max(min(t407 + 2, input.extent.0), 1) + -1), t407) + 2, input.extent.0) + -1, t407), 0), max(min(min(max(min(min(t407 + 2, input.extent.0) + -1, t407), 0), max(min(min(max(min(input.extent.0 + -1, t407), 0), max(min(t407 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in (let t423 = min(max(min(min(min(min(max(min(input.extent.0 + -1, t411), 0), max(min(t411 + 2, input.extent.0), 1) + -1), t411) + 2, input.extent.0) + -1, t411), 0), max(min(min(max(min(min(t411 + 2, input.extent.0) + -1, t411), 0), max(min(min(max(min(input.extent.0 + -1, t411), 0), max(min(t411 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in (let t427 = min(max(min(min(min(min(max(min(input.extent.0 + -1, t423), 0), max(min(t423 + 2, input.extent.0), 1) + -1), t423) + 2, input.extent.0) + -1, t423), 0), max(min(min(max(min(min(t423 + 2, input.extent.0) + -1, t423), 0), max(min(min(max(min(input.extent.0 + -1, t423), 0), max(min(t423 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1) in min(max(min(min(t427 + 2, input.extent.0) + -1, t427), 0), max(min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(min(max(min(t427 + 2, input.extent.0), 1), min(min(min(max(min(input.extent.0 + -1, t427), 0), min(max(min(t427 + 2, input.extent.0), 1), min(max(min(input.extent.0 + -1, t427), 0), min(max(min(t427 + 2, input.extent.0), 1), min(max(min(input.extent.0 + -1, t427), 0), min(max(min(t427 + 2, input.extent.0), 1), max(min(input.extent.0 + -1, t427), 0) + 1)) + 1)) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1), min(max(min(input.extent.0 + -1, t427), 0), max(min(t427 + 2, input.extent.0), 1) + -1) + 2)), max(min(input.extent.0 + -1, t427), 0) + 1)), max(min(t427 + 2, input.extent.0), 1) + -1) + 2, input.extent.0), 1) + -1)))))) + 1)
which becomes insanely huge afterwards (too large to bother pasting here -- something like 7MB of text when the Expr is printed), and that's after the call to simplify().
EDIT: the corresponding scope at that point:
scope:
{
output$1.s0.x.max
output$1.s0.x.min
output$1.s0.y.max
output$1.s0.y.min
output$1.s1.r8$x.max
output$1.s1.r8$x.min
output$1.s1.x.max
output$1.s1.x.min
output$1.s1.y.max
output$1.s1.y.min
output$10.s0.x.max
output$10.s0.x.min
output$10.s0.y.max
output$10.s0.y.min
output$10.s1.r125$x.max
output$10.s1.r125$x.min
output$10.s1.x.max
output$10.s1.x.min
output$10.s1.y.max
output$10.s1.y.min
output$11.s0.x.max
output$11.s0.x.min
output$11.s0.y.max
output$11.s0.y.min
output$11.s1.r138$x.max
output$11.s1.r138$x.min
output$11.s1.x.max
output$11.s1.x.min
output$11.s1.y.max
output$11.s1.y.min
output$12.s0.x.max
output$12.s0.x.min
output$12.s0.y.max
output$12.s0.y.min
output$12.s1.r151$x.max
output$12.s1.r151$x.min
output$12.s1.x.max
output$12.s1.x.min
output$12.s1.y.max
output$12.s1.y.min
output$13.s0.x.max
output$13.s0.x.min
output$13.s0.y.max
output$13.s0.y.min
output$13.s1.r164$x.max
output$13.s1.r164$x.min
output$13.s1.x.max
output$13.s1.x.min
output$13.s1.y.max
output$13.s1.y.min
output$14.s0.x.max
output$14.s0.x.min
output$14.s0.y.max
output$14.s0.y.min
output$14.s1.r177$x.max
output$14.s1.r177$x.min
output$14.s1.x.max
output$14.s1.x.min
output$14.s1.y.max
output$14.s1.y.min
output$2.s0.x.max
output$2.s0.x.min
output$2.s0.y.max
output$2.s0.y.min
output$2.s1.r21$x.max
output$2.s1.r21$x.min
output$2.s1.x.max
output$2.s1.x.min
output$2.s1.y.max
output$2.s1.y.min
output$3.s0.x.max
output$3.s0.x.min
output$3.s0.y.max
output$3.s0.y.min
output$3.s1.r34$x.max
output$3.s1.r34$x.min
output$3.s1.x.max
output$3.s1.x.min
output$3.s1.y.max
output$3.s1.y.min
output$4.s0.x.max
output$4.s0.x.min
output$4.s0.y.max
output$4.s0.y.min
output$4.s1.r47$x.max
output$4.s1.r47$x.min
output$4.s1.x.max
output$4.s1.x.min
output$4.s1.y.max
output$4.s1.y.min
output$5.s0.x.max
output$5.s0.x.min
output$5.s0.y.max
output$5.s0.y.min
output$5.s1.r60$x.max
output$5.s1.r60$x.min
output$5.s1.x.max
output$5.s1.x.min
output$5.s1.y.max
output$5.s1.y.min
output$6.s0.x.max
output$6.s0.x.min
output$6.s0.y.max
output$6.s0.y.min
output$6.s1.r73$x.max
output$6.s1.r73$x.min
output$6.s1.x.max
output$6.s1.x.min
output$6.s1.y.max
output$6.s1.y.min
output$7.s0.x.max
output$7.s0.x.min
output$7.s0.y.max
output$7.s0.y.min
output$7.s1.r86$x.max
output$7.s1.r86$x.min
output$7.s1.x.max
output$7.s1.x.min
output$7.s1.y.max
output$7.s1.y.min
output$8.s0.x.max
output$8.s0.x.min
output$8.s0.y.max
output$8.s0.y.min
output$8.s1.r99$x.max
output$8.s1.r99$x.min
output$8.s1.x.max
output$8.s1.x.min
output$8.s1.y.max
output$8.s1.y.min
output$9.s0.x.max
output$9.s0.x.min
output$9.s0.y.max
output$9.s0.y.min
output$9.s1.r112$x.max
output$9.s1.r112$x.min
output$9.s1.x.max
output$9.s1.x.min
output$9.s1.y.max
output$9.s1.y.min
foo$1.s0.x.max
foo$1.s0.x.max.s
foo$1.s0.x.min
foo$1.s0.y.max
foo$1.s0.y.max.s
foo$1.s0.y.min
foo$10.s0.x.max
foo$10.s0.x.max.s
foo$10.s0.x.min
foo$10.s0.y.max
foo$10.s0.y.max.s
foo$10.s0.y.min
foo$11.s0.x.max
foo$11.s0.x.max.s
foo$11.s0.x.min
foo$11.s0.y.max
foo$11.s0.y.max.s
foo$11.s0.y.min
foo$12.s0.x.max
foo$12.s0.x.max.s
foo$12.s0.x.min
foo$12.s0.y.max
foo$12.s0.y.max.s
foo$12.s0.y.min
foo$13.s0.x.max
foo$13.s0.x.min
foo$13.s0.y.max
foo$13.s0.y.min
foo$2.s0.x.max
foo$2.s0.x.max.s
foo$2.s0.x.min
foo$2.s0.y.max
foo$2.s0.y.max.s
foo$2.s0.y.min
foo$3.s0.x.max
foo$3.s0.x.max.s
foo$3.s0.x.min
foo$3.s0.y.max
foo$3.s0.y.max.s
foo$3.s0.y.min
foo$4.s0.x.max
foo$4.s0.x.max.s
foo$4.s0.x.min
foo$4.s0.y.max
foo$4.s0.y.max.s
foo$4.s0.y.min
foo$5.s0.x.max
foo$5.s0.x.max.s
foo$5.s0.x.min
foo$5.s0.y.max
foo$5.s0.y.max.s
foo$5.s0.y.min
foo$6.s0.x.max
foo$6.s0.x.max.s
foo$6.s0.x.min
foo$6.s0.y.max
foo$6.s0.y.max.s
foo$6.s0.y.min
foo$7.s0.x.max
foo$7.s0.x.max.s
foo$7.s0.x.min
foo$7.s0.y.max
foo$7.s0.y.max.s
foo$7.s0.y.min
foo$8.s0.x.max
foo$8.s0.x.max.s
foo$8.s0.x.min
foo$8.s0.y.max
foo$8.s0.y.max.s
foo$8.s0.y.min
foo$9.s0.x.max
foo$9.s0.x.max.s
foo$9.s0.x.min
foo$9.s0.y.max
foo$9.s0.y.max.s
foo$9.s0.y.min
foo.s0.x.max
foo.s0.x.max.s
foo.s0.x.min
foo.s0.y.max
foo.s0.y.max.s
foo.s0.y.min
}
Is it possible to know if the scope has any values actually set? Sorry, I didn't realize that printing scope only prints the names, I need the corresponding intervals as well.
Definitely seems like the issue here is substitute_some_lets. Not sure exactly what the count should be, but 100 is too high
Is it possible to know if the scope has any values actually set? Sorry, I didn't realize that printing scope only prints the names, I need the corresponding intervals as well.
{
output$1.s0.x.max: 0, (void *)pos_inf
output$1.s0.x.min: 0, (void *)pos_inf
output$1.s0.y.max: 0, (void *)pos_inf
output$1.s0.y.min: 0, (void *)pos_inf
output$1.s1.r8$x.max: 3, 3
output$1.s1.r8$x.min: 0, 0
output$1.s1.x.max: 0, (void *)pos_inf
output$1.s1.x.min: 0, (void *)pos_inf
output$1.s1.y.max: 0, (void *)pos_inf
output$1.s1.y.min: 0, (void *)pos_inf
output$10.s0.x.max: 0, (void *)pos_inf
output$10.s0.x.min: 0, (void *)pos_inf
output$10.s0.y.max: 0, (void *)pos_inf
output$10.s0.y.min: 0, (void *)pos_inf
output$10.s1.r125$x.max: 3, 3
output$10.s1.r125$x.min: 0, 0
output$10.s1.x.max: 0, (void *)pos_inf
output$10.s1.x.min: 0, (void *)pos_inf
output$10.s1.y.max: 0, (void *)pos_inf
output$10.s1.y.min: 0, (void *)pos_inf
output$11.s0.x.max: 0, (void *)pos_inf
output$11.s0.x.min: 0, (void *)pos_inf
output$11.s0.y.max: 0, (void *)pos_inf
output$11.s0.y.min: 0, (void *)pos_inf
output$11.s1.r138$x.max: 3, 3
output$11.s1.r138$x.min: 0, 0
output$11.s1.x.max: 0, (void *)pos_inf
output$11.s1.x.min: 0, (void *)pos_inf
output$11.s1.y.max: 0, (void *)pos_inf
output$11.s1.y.min: 0, (void *)pos_inf
output$12.s0.x.max: 0, (void *)pos_inf
output$12.s0.x.min: 0, (void *)pos_inf
output$12.s0.y.max: 0, (void *)pos_inf
output$12.s0.y.min: 0, (void *)pos_inf
output$12.s1.r151$x.max: 3, 3
output$12.s1.r151$x.min: 0, 0
output$12.s1.x.max: 0, (void *)pos_inf
output$12.s1.x.min: 0, (void *)pos_inf
output$12.s1.y.max: 0, (void *)pos_inf
output$12.s1.y.min: 0, (void *)pos_inf
output$13.s0.x.max: 0, (void *)pos_inf
output$13.s0.x.min: 0, (void *)pos_inf
output$13.s0.y.max: 0, (void *)pos_inf
output$13.s0.y.min: 0, (void *)pos_inf
output$13.s1.r164$x.max: 3, 3
output$13.s1.r164$x.min: 0, 0
output$13.s1.x.max: 0, (void *)pos_inf
output$13.s1.x.min: 0, (void *)pos_inf
output$13.s1.y.max: 0, (void *)pos_inf
output$13.s1.y.min: 0, (void *)pos_inf
output$14.s0.x.max: (void *)neg_inf, (void *)pos_inf
output$14.s0.x.min: (void *)neg_inf, (void *)pos_inf
output$14.s0.y.max: (void *)neg_inf, (void *)pos_inf
output$14.s0.y.min: (void *)neg_inf, (void *)pos_inf
output$14.s1.r177$x.max: 3, 3
output$14.s1.r177$x.min: 0, 0
output$14.s1.x.max: (void *)neg_inf, (void *)pos_inf
output$14.s1.x.min: (void *)neg_inf, (void *)pos_inf
output$14.s1.y.max: (void *)neg_inf, (void *)pos_inf
output$14.s1.y.min: (void *)neg_inf, (void *)pos_inf
output$2.s0.x.max: 0, (void *)pos_inf
output$2.s0.x.min: 0, (void *)pos_inf
output$2.s0.y.max: 0, (void *)pos_inf
output$2.s0.y.min: 0, (void *)pos_inf
output$2.s1.r21$x.max: 3, 3
output$2.s1.r21$x.min: 0, 0
output$2.s1.x.max: 0, (void *)pos_inf
output$2.s1.x.min: 0, (void *)pos_inf
output$2.s1.y.max: 0, (void *)pos_inf
output$2.s1.y.min: 0, (void *)pos_inf
output$3.s0.x.max: 0, (void *)pos_inf
output$3.s0.x.min: 0, (void *)pos_inf
output$3.s0.y.max: 0, (void *)pos_inf
output$3.s0.y.min: 0, (void *)pos_inf
output$3.s1.r34$x.max: 3, 3
output$3.s1.r34$x.min: 0, 0
output$3.s1.x.max: 0, (void *)pos_inf
output$3.s1.x.min: 0, (void *)pos_inf
output$3.s1.y.max: 0, (void *)pos_inf
output$3.s1.y.min: 0, (void *)pos_inf
output$4.s0.x.max: 0, (void *)pos_inf
output$4.s0.x.min: 0, (void *)pos_inf
output$4.s0.y.max: 0, (void *)pos_inf
output$4.s0.y.min: 0, (void *)pos_inf
output$4.s1.r47$x.max: 3, 3
output$4.s1.r47$x.min: 0, 0
output$4.s1.x.max: 0, (void *)pos_inf
output$4.s1.x.min: 0, (void *)pos_inf
output$4.s1.y.max: 0, (void *)pos_inf
output$4.s1.y.min: 0, (void *)pos_inf
output$5.s0.x.max: 0, (void *)pos_inf
output$5.s0.x.min: 0, (void *)pos_inf
output$5.s0.y.max: 0, (void *)pos_inf
output$5.s0.y.min: 0, (void *)pos_inf
output$5.s1.r60$x.max: 3, 3
output$5.s1.r60$x.min: 0, 0
output$5.s1.x.max: 0, (void *)pos_inf
output$5.s1.x.min: 0, (void *)pos_inf
output$5.s1.y.max: 0, (void *)pos_inf
output$5.s1.y.min: 0, (void *)pos_inf
output$6.s0.x.max: 0, (void *)pos_inf
output$6.s0.x.min: 0, (void *)pos_inf
output$6.s0.y.max: 0, (void *)pos_inf
output$6.s0.y.min: 0, (void *)pos_inf
output$6.s1.r73$x.max: 3, 3
output$6.s1.r73$x.min: 0, 0
output$6.s1.x.max: 0, (void *)pos_inf
output$6.s1.x.min: 0, (void *)pos_inf
output$6.s1.y.max: 0, (void *)pos_inf
output$6.s1.y.min: 0, (void *)pos_inf
output$7.s0.x.max: 0, (void *)pos_inf
output$7.s0.x.min: 0, (void *)pos_inf
output$7.s0.y.max: 0, (void *)pos_inf
output$7.s0.y.min: 0, (void *)pos_inf
output$7.s1.r86$x.max: 3, 3
output$7.s1.r86$x.min: 0, 0
output$7.s1.x.max: 0, (void *)pos_inf
output$7.s1.x.min: 0, (void *)pos_inf
output$7.s1.y.max: 0, (void *)pos_inf
output$7.s1.y.min: 0, (void *)pos_inf
output$8.s0.x.max: 0, (void *)pos_inf
output$8.s0.x.min: 0, (void *)pos_inf
output$8.s0.y.max: 0, (void *)pos_inf
output$8.s0.y.min: 0, (void *)pos_inf
output$8.s1.r99$x.max: 3, 3
output$8.s1.r99$x.min: 0, 0
output$8.s1.x.max: 0, (void *)pos_inf
output$8.s1.x.min: 0, (void *)pos_inf
output$8.s1.y.max: 0, (void *)pos_inf
output$8.s1.y.min: 0, (void *)pos_inf
output$9.s0.x.max: 0, (void *)pos_inf
output$9.s0.x.min: 0, (void *)pos_inf
output$9.s0.y.max: 0, (void *)pos_inf
output$9.s0.y.min: 0, (void *)pos_inf
output$9.s1.r112$x.max: 3, 3
output$9.s1.r112$x.min: 0, 0
output$9.s1.x.max: 0, (void *)pos_inf
output$9.s1.x.min: 0, (void *)pos_inf
output$9.s1.y.max: 0, (void *)pos_inf
output$9.s1.y.min: 0, (void *)pos_inf
foo$1.s0.x.max: 0, (void *)pos_inf
foo$1.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$1.s0.x.min: 0, (void *)pos_inf
foo$1.s0.y.max: 0, (void *)pos_inf
foo$1.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$1.s0.y.min: 0, (void *)pos_inf
foo$10.s0.x.max: 0, (void *)pos_inf
foo$10.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$10.s0.x.min: 0, (void *)pos_inf
foo$10.s0.y.max: 0, (void *)pos_inf
foo$10.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$10.s0.y.min: 0, (void *)pos_inf
foo$11.s0.x.max: 0, (void *)pos_inf
foo$11.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$11.s0.x.min: 0, (void *)pos_inf
foo$11.s0.y.max: 0, (void *)pos_inf
foo$11.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$11.s0.y.min: 0, (void *)pos_inf
foo$12.s0.x.max: 0, (void *)pos_inf
foo$12.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$12.s0.x.min: 0, (void *)pos_inf
foo$12.s0.y.max: 0, (void *)pos_inf
foo$12.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$12.s0.y.min: 0, (void *)pos_inf
foo$13.s0.x.max: (void *)neg_inf, (void *)pos_inf
foo$13.s0.x.min: (void *)neg_inf, (void *)pos_inf
foo$13.s0.y.max: (void *)neg_inf, (void *)pos_inf
foo$13.s0.y.min: (void *)neg_inf, (void *)pos_inf
foo$2.s0.x.max: 0, (void *)pos_inf
foo$2.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$2.s0.x.min: 0, (void *)pos_inf
foo$2.s0.y.max: 0, (void *)pos_inf
foo$2.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$2.s0.y.min: 0, (void *)pos_inf
foo$3.s0.x.max: 0, (void *)pos_inf
foo$3.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$3.s0.x.min: 0, (void *)pos_inf
foo$3.s0.y.max: 0, (void *)pos_inf
foo$3.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$3.s0.y.min: 0, (void *)pos_inf
foo$4.s0.x.max: 0, (void *)pos_inf
foo$4.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$4.s0.x.min: 0, (void *)pos_inf
foo$4.s0.y.max: 0, (void *)pos_inf
foo$4.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$4.s0.y.min: 0, (void *)pos_inf
foo$5.s0.x.max: 0, (void *)pos_inf
foo$5.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$5.s0.x.min: 0, (void *)pos_inf
foo$5.s0.y.max: 0, (void *)pos_inf
foo$5.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$5.s0.y.min: 0, (void *)pos_inf
foo$6.s0.x.max: 0, (void *)pos_inf
foo$6.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$6.s0.x.min: 0, (void *)pos_inf
foo$6.s0.y.max: 0, (void *)pos_inf
foo$6.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$6.s0.y.min: 0, (void *)pos_inf
foo$7.s0.x.max: 0, (void *)pos_inf
foo$7.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$7.s0.x.min: 0, (void *)pos_inf
foo$7.s0.y.max: 0, (void *)pos_inf
foo$7.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$7.s0.y.min: 0, (void *)pos_inf
foo$8.s0.x.max: 0, (void *)pos_inf
foo$8.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$8.s0.x.min: 0, (void *)pos_inf
foo$8.s0.y.max: 0, (void *)pos_inf
foo$8.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$8.s0.y.min: 0, (void *)pos_inf
foo$9.s0.x.max: 0, (void *)pos_inf
foo$9.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo$9.s0.x.min: 0, (void *)pos_inf
foo$9.s0.y.max: 0, (void *)pos_inf
foo$9.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo$9.s0.y.min: 0, (void *)pos_inf
foo.s0.x.max: 0, (void *)pos_inf
foo.s0.x.max.s: (void *)neg_inf, (void *)pos_inf
foo.s0.x.min: 0, (void *)pos_inf
foo.s0.y.max: 0, (void *)pos_inf
foo.s0.y.max.s: (void *)neg_inf, (void *)pos_inf
foo.s0.y.min: 0, (void *)pos_inf
}
@steven-johnson Do you think you could run Google testing again? I think my tests just never had such enormous expressions, the example you provided should end reasonably fast now.
@steven-johnson Do you think you could run Google testing again? I think my tests just never had such enormous expressions, the example you provided should end reasonably fast now.
Testing now, but hiding an apparently-critical constant (the count arg to substitute_some_lets) as a default-value argument seems suboptimal. If 16 is a good value for everything, make it internal to the function and name and comment on it. If it's not a good value for everything, don't give it a default value.
(Tests look good so far, stand by)
@steven-johnson Thanks for the feedback, I added documentation explaining the chosen value.
I don't see any regressions in Google now, LGTM, land with approval
Thanks @steven-johnson ! @abadams good to go?
Is this ready to land (pending green)?
No - I still need to address Andrew's point on using deep_equality, and still need feedback on the substiution.
Sorry for dropping the ball on this, I was out for a conference for a week and have been playing catch-up on other duties in the week since. Will try to make more progress on it this week.
No worries, just trying to catch up on things after returning from my own vacation -- no rush on this from my perspective.
Thanks! I hope it was a fun vacation!
Are we hoping that this will allow us to remove the HL_PERMIT_FAILED_UNROLL hack?
Hey, just a periodic status check on this one.
Sorry - I'm getting a tad behind, and this PR has been on the back burner for a bit. I will try to get to it in the next few weeks.
Monday Morning Review Ping -- where does this PR stand?
It still has a bit of work to be done, and I have not managed to get to it yet. I haven't forgotten it, and will aim to address it by the end of September (I know that's far away and I apologize, but I am currently in paper-writing mode + am about to move across the country)