Function call cross-product abstraction doesn't handle loops with may-exit edges
I've been hacking at another loop idiom:
declare void @escape(i8, i8, i8, i1, i8)
define i8 @src(i8 %val, i8 %start, i8 %extraoffset) {
entry:
br label %loop
loop:
%iv = phi i8 [ %start, %entry ], [ %iv.next, %loop ]
%nbits = add nsw i8 %iv, %extraoffset ; 0
%val.shifted = lshr i8 %val, %nbits
%val.shifted.iszero = icmp eq i8 %val.shifted, 0
%iv.next = add i8 %iv, 1
; call void @escape(i8 %iv,
; i8 %nbits,
; i8 %val.shifted,
; i1 %val.shifted.iszero,
; i8 %iv.next)
br i1 %val.shifted.iszero, label %end, label %loop
end:
%iv.res = phi i8 [ %iv, %loop ]
%nbits.res = phi i8 [ %nbits, %loop ]
%val.shifted.res = phi i8 [ %val.shifted, %loop ]
%val.shifted.iszero.res = phi i1 [ %val.shifted.iszero, %loop ]
%iv.next.res = phi i8 [ %iv.next, %loop ]
; call void @escape(i8 %iv.res,
; i8 %nbits.res,
; i8 %val.shifted.res,
; i1 %val.shifted.iszero.res,
; i8 %iv.next.res)
ret i8 %iv.res
}
declare i8 @llvm.ctlz.i8(i8 %x, i1)
declare i8 @llvm.smax.i8(i8 %x, i8 %y)
define i8 @tgt(i8 %val, i8 %start, i8 %extraoffset) {
entry:
%val.numleadingzeros = call i8 @llvm.ctlz.i8(i8 %val, i1 0)
%val.numactivebits = sub i8 8, %val.numleadingzeros
%extraoffset.neg = sub i8 0, %extraoffset
%tmp = add i8 %val.numactivebits, %extraoffset.neg
%iv.final = call i8 @llvm.smax.i8(i8 %tmp, i8 %start)
%loop.tripcount = sub i8 %iv.final, %start
br label %loop
loop:
%loop.iv = phi i8 [ 0, %entry ], [ %loop.iv.next, %loop ]
%loop.iv.next = add i8 %loop.iv, 1
%loop.ivcheck = icmp eq i8 %loop.iv, %loop.tripcount
%iv = add i8 %loop.iv, %start
%nbits = add nsw i8 %iv, %extraoffset
%val.shifted = lshr i8 %val, %nbits
%iv.next = add i8 %iv, 1
; call void @escape(i8 %iv,
; i8 %nbits,
; i8 %val.shifted,
; i1 %loop.ivcheck,
; i8 %iv.next)
br i1 %loop.ivcheck, label %end, label %loop
end:
%iv.res = phi i8 [ %iv.final, %loop ]
%nbits.res = phi i8 [ %nbits, %loop ]
%val.shifted.res = phi i8 [ %val.shifted, %loop ]
%val.shifted.iszero.res = phi i1 [ %loop.ivcheck, %loop ]
%iv.next.res = phi i8 [ %iv.next, %loop ]
; call void @escape(i8 %iv.res,
; i8 %nbits.res,
; i8 %val.shifted.res,
; i1 %val.shifted.iszero.res,
; i8 %iv.next.res)
ret i8 %iv.final
}
That source as-is finishes quickly:
$ time <...>/alive-tv /tmp/alive.ll --src-unroll=8 --tgt-unroll=8 --smt-to=600000
<..>
Transformation seems to be correct!
real 0m1.882s
user 0m1.866s
sys 0m0.016s
If i uncomment the calls outside of the loops (keeping in-loop calls commented out), it also finishes in reasonable timeframe:
Transformation seems to be correct!
real 0m5.208s
user 0m5.188s
sys 0m0.020s
If i uncomment the calls inside of the loops (keeping out-of-the-loop calls commented out), it also finishes in reasonable timeframe:
Transformation seems to be correct!
real 0m7.194s
user 0m7.178s
sys 0m0.016s
BUT. If i uncomment all calls, it does not finish with 1 minute timeout...
BUT. If i uncomment all calls, it does not finish with 1 minute timeout...
Didn't finish in 10 minues also.
Function calls are expensive. We have to do a cross-product of all calls across src/tgt in the worst case. That said, we do have an optimization that tries to syntactic reduce the set of relevant pairs (over-approximation). The abstraction doesn't work in this example as the final call in tgt can potentially match with any call inside the src loop because we don't track whether the loop iterations are synchronized across the two programs. But this is a very good example showing that the current abstraction isn't sufficient!
TL;DR: if you call different functions inside and outside the loop, it won't timeout (didn't check, but I'm fairly sure). At least until we manage to fix this.
TL;DR: if you call different functions inside and outside the loop, it won't timeout (didn't check, but I'm fairly sure). At least until we manage to fix this.
Aha! It finished in 10sec then. Thank you! This should unblock LIR patch.
This should unblock LIR patch.
Posted @ https://reviews.llvm.org/D102116
I've created a branch (https://github.com/AliveToolkit/alive2/tree/call_range_path) that uses a symbolic path to prune the cross-product. It helps with the test case above, but only when undef inputs are disabled. and it requires extra SMT calls; the pre-processor isn't good enough.
An alternative trick would be to consider the number of calls guaranteed to execute afterwards. Doesn't help the 2nd example of #744, but would handle the case above trivially. And it's cheap to run (less so to implement 😅). I'm wondering though how this matters in practice and if it's worth spending more time on this as it's not a super common problem and it's easy to circumvent by hand.