alive2 Function call cross-product abstraction doesn't handle loops with may-exit edges

I've been hacking at another loop idiom:

declare void @escape(i8, i8, i8, i1, i8)

define i8 @src(i8 %val, i8 %start, i8 %extraoffset) {
entry:
  br label %loop

loop:
  %iv = phi i8 [ %start, %entry ], [ %iv.next, %loop ]
  %nbits = add nsw i8 %iv, %extraoffset ; 0
  %val.shifted = lshr i8 %val, %nbits
  %val.shifted.iszero = icmp eq i8 %val.shifted, 0
  %iv.next = add i8 %iv, 1

;   call void @escape(i8 %iv,
;                     i8 %nbits,
;                     i8 %val.shifted,
;                     i1 %val.shifted.iszero,
;                     i8 %iv.next)

  br i1 %val.shifted.iszero, label %end, label %loop

end:
  %iv.res = phi i8 [ %iv, %loop ]
  %nbits.res = phi i8 [ %nbits, %loop ]
  %val.shifted.res = phi i8 [ %val.shifted, %loop ]
  %val.shifted.iszero.res = phi i1 [ %val.shifted.iszero, %loop ]
  %iv.next.res = phi i8 [ %iv.next, %loop ]

;   call void @escape(i8 %iv.res,
;                     i8 %nbits.res,
;                     i8 %val.shifted.res,
;                     i1 %val.shifted.iszero.res,
;                     i8 %iv.next.res)

  ret i8 %iv.res
}

declare i8 @llvm.ctlz.i8(i8 %x, i1)
declare i8 @llvm.smax.i8(i8 %x, i8 %y)

define i8 @tgt(i8 %val, i8 %start, i8 %extraoffset) {
entry:
  %val.numleadingzeros = call i8 @llvm.ctlz.i8(i8 %val, i1 0)
  %val.numactivebits = sub i8 8, %val.numleadingzeros
  %extraoffset.neg = sub i8 0, %extraoffset
  %tmp = add i8 %val.numactivebits, %extraoffset.neg
  %iv.final = call i8 @llvm.smax.i8(i8 %tmp, i8 %start)
  %loop.tripcount = sub i8 %iv.final, %start
  br label %loop

loop:
  %loop.iv = phi i8 [ 0, %entry ], [ %loop.iv.next, %loop ]
  %loop.iv.next = add i8 %loop.iv, 1
  %loop.ivcheck = icmp eq i8 %loop.iv, %loop.tripcount

  %iv = add i8 %loop.iv, %start
  %nbits = add nsw i8 %iv, %extraoffset
  %val.shifted = lshr i8 %val, %nbits
  %iv.next = add i8 %iv, 1

;   call void @escape(i8 %iv,
;                     i8 %nbits,
;                     i8 %val.shifted,
;                     i1 %loop.ivcheck,
;                     i8 %iv.next)

  br i1 %loop.ivcheck, label %end, label %loop

end:
  %iv.res = phi i8 [ %iv.final, %loop ]
  %nbits.res = phi i8 [ %nbits, %loop ]
  %val.shifted.res = phi i8 [ %val.shifted, %loop ]
  %val.shifted.iszero.res = phi i1 [ %loop.ivcheck, %loop ]
  %iv.next.res = phi i8 [ %iv.next, %loop ]

;   call void @escape(i8 %iv.res,
;                     i8 %nbits.res,
;                     i8 %val.shifted.res,
;                     i1 %val.shifted.iszero.res,
;                     i8 %iv.next.res)

  ret i8 %iv.final
}

That source as-is finishes quickly:

$ time <...>/alive-tv /tmp/alive.ll --src-unroll=8 --tgt-unroll=8 --smt-to=600000
<..>
Transformation seems to be correct!

real    0m1.882s
user    0m1.866s
sys     0m0.016s

If i uncomment the calls outside of the loops (keeping in-loop calls commented out), it also finishes in reasonable timeframe:

Transformation seems to be correct!


real    0m5.208s
user    0m5.188s
sys     0m0.020s

If i uncomment the calls inside of the loops (keeping out-of-the-loop calls commented out), it also finishes in reasonable timeframe:

Transformation seems to be correct!


real    0m7.194s
user    0m7.178s
sys     0m0.016s

BUT. If i uncomment all calls, it does not finish with 1 minute timeout...

May 06 '21 15:05 LebedevRI

BUT. If i uncomment all calls, it does not finish with 1 minute timeout...

Didn't finish in 10 minues also.

May 06 '21 16:05 LebedevRI

Function calls are expensive. We have to do a cross-product of all calls across src/tgt in the worst case. That said, we do have an optimization that tries to syntactic reduce the set of relevant pairs (over-approximation). The abstraction doesn't work in this example as the final call in tgt can potentially match with any call inside the src loop because we don't track whether the loop iterations are synchronized across the two programs. But this is a very good example showing that the current abstraction isn't sufficient!

TL;DR: if you call different functions inside and outside the loop, it won't timeout (didn't check, but I'm fairly sure). At least until we manage to fix this.

May 06 '21 16:05 nunoplopes

TL;DR: if you call different functions inside and outside the loop, it won't timeout (didn't check, but I'm fairly sure). At least until we manage to fix this.

Aha! It finished in 10sec then. Thank you! This should unblock LIR patch.

May 06 '21 16:05 LebedevRI

This should unblock LIR patch.

Posted @ https://reviews.llvm.org/D102116

May 08 '21 22:05 LebedevRI

I've created a branch (https://github.com/AliveToolkit/alive2/tree/call_range_path) that uses a symbolic path to prune the cross-product. It helps with the test case above, but only when undef inputs are disabled. and it requires extra SMT calls; the pre-processor isn't good enough.

An alternative trick would be to consider the number of calls guaranteed to execute afterwards. Doesn't help the 2nd example of #744, but would handle the case above trivially. And it's cheap to run (less so to implement 😅). I'm wondering though how this matters in practice and if it's worth spending more time on this as it's not a super common problem and it's easy to circumvent by hand.

Sep 05 '21 17:09 nunoplopes