owi icon indicating copy to clipboard operation
owi copied to clipboard

Add dichotomic symbolic clz and ctz

Open krtab opened this issue 2 years ago • 9 comments

This is supposedly SMT friendly, and keeps the bit-hacks for the concrete case.

krtab avatar Feb 26 '24 18:02 krtab

Do you have approximate numbers on how much faster is it ?

redianthus avatar Feb 26 '24 18:02 redianthus

This is super cool! I'd love to see the differences between this branching approach and using ite expressions inside aux. Solver-wise, it would probably take much more time, but it might be more memory-friendly?

EDIT: maybe I'll write one such implementation in encoding and see :sweat_smile:

filipeom avatar Feb 26 '24 18:02 filipeom

@zapashcanon :

Do you have approximate numbers on how much faster is it ?

Well the previous implementation when run on tests gave

owi: internal error, uncaught exception:
     Failure("Bv: Unsupported Z3 unary operator \"clz\"")

So it may have been faster but certainly less useful :wink:

@filipeom

This is super cool! I'd love to see the differences between this branching approach and using ite expressions inside aux. Solver-wise, it would probably take much more time, but it might be more memory-friendly?

Could you clarify what you mean by "ite expressions" ?

krtab avatar Feb 27 '24 08:02 krtab

I also added ctz in a similar vein. @zapashcanon we can see on thursday how to better organize this code

krtab avatar Feb 27 '24 17:02 krtab

Could you clarify what you mean by "ite expressions" ?

I mean the ternary operator if-then-else. It would allow us to write the same code in a single expression. Here is the pseudo code for the clz algorithm you implemented, using the ternary operator (ite):

let clz n =
  let rec aux (lb : int) ub =
    if ub = lb + 1 then I32.const_i32 (Int32.of_int (32 - ub))
    else
      let mid = (lb + ub) / 2 in
      let two_pow_mid = Int32.shl 1l (Int32.of_int mid) in
      let cond = I32.lt_u n (I32.const_i32 two_pow_mid) in
      ite cond (aux lb mid) (aux mid ub)
  ite (I32.eqz n) (I32.const_i32 32l) (aux 0 32)

I did a quick and dirty implementation of this directly in Z3, which you can find here: https://github.com/formalsec/encoding/commit/2f83d349e41c2a1a225766654db0b957b8b17d2c (hope you don't mind!)

After that, I measured execution times and paths explored on clz_64.wast. The results are as follows:

Approach Number of Paths User time (s) Wall clock (s)
branching 2081 192.30 37.17
ite-exprs 65 6.19 1.75

I referred to your approach as branching and the other as ite-exprs. It appears that the use of branching in the ctz implementation is significantly penalizing.

I also experimented with removing the second assert in the test, and the performance was much faster, essentially identical between the two approaches, with the same number of paths as well.

Perhaps it's worth providing an implementation for these operators in encoding and see how they compare against this approach. What do you think?

filipeom avatar Feb 27 '24 23:02 filipeom

Hi @filipeom !

Very interesting thank you. A few points:

  1. First of all, I agree that my current implementations is likely to not be the best one, other possibilities include: Z3's ite expressions (as you implemented), changing the choice decision tree for a direct expression of 32 choices each with only two bounds added to the path condition, changing the < check for a more bitlevel one etc.
  2. However, actually choosing the best one can only be done with more (and more realistic) benchmarks IMO. We not only want an implementation that is "locally efficient", meaning that it quickly allows owi-sym to process a single clz instruction, we also want one that makes it efficient to reason on the newly added path conditions afterwards. My intuition of Z3 inner working is way too poor to guess what will be the best choice in general, but having benchmarks on realistic code (and not a unit test like here) will help.
  3. "I also experimented with removing the second assert in the test, and the performance was much faster, essentially identical between the two approaches, with the same number of paths as well." Interesting, I wonder what makes it so. Maybe the clz-based ctz implementation forces Z3 to exhaustively check all possible bit configurations?

krtab avatar Feb 28 '24 13:02 krtab

... We not only want an implementation that is "locally efficient", meaning that it quickly allows owi-sym to process a single clz instruction, we also want one that makes it efficient to reason on the newly added path conditions afterwards ...

Yes, you are correct. However, I want to note that having easier-to-reason path conditions might not be as helpful when there are exponentially as many branches to check for satisfiability. Even though you can quickly reason about them in the SMT solver, it requires exponential memory usage to do so. Regardless, even if I prefer approaches that do not branch to encode such operations, I must admit that this approach might be better in cases where there is a good path exploration strategy.

Interesting, I wonder what makes it so. Maybe the clz-based ctz implementation forces Z3 to exhaustively check all possible bit configurations?

I think it's because $countLeadingZeros will always branch, so it ends up losing the ability to merge all 64 states with the ite approach.

But in summary:

  • Branching in these operations can lead to an earlier state explosion situation.
  • It might be the case that this approach can lead to easier path-specific reasoning, allowing for faster bug finding.
  • More benchmarking should be done in actual programs that use clz or similar operations that reason about the entire bitvector.
  • I dislike these extreme branching approaches as they create spurious paths and subsequently generate test cases that don't contribute to any increase in line coverage of the program under testing.

filipeom avatar Feb 28 '24 13:02 filipeom

@filipeom, do you think this could be added more easily (i.e. without the optional reference to a function hack) with your current work on the memory?

redianthus avatar Jul 30 '24 13:07 redianthus

I think the issue remains. Even though we can now use Symbolic_choice_without_memory elsewhere, we can't use it in Symbolic_value because Symbolic_choice_without_memory is still defined using Symbolic_value (circular dependency). I think the way to go here would be as you outlined in https://github.com/OCamlPro/owi/pull/195#discussion_r1503875986.

filipeom avatar Jul 30 '24 14:07 filipeom

Hi @filipeom, I'm also wondering if we could not simply add this in smtml directly? I wanted to have a look at your commit doing this for Z3 but the link is not working anymore.

redianthus avatar Feb 13 '25 15:02 redianthus

Hi @filipeom, I'm also wondering if we could not simply add this in smtml directly? I wanted to have a look at your commit doing this for Z3 but the link is not working anymore.

I've made it solver agnostic using the parametric mappings:

https://github.com/formalsec/smtml/blob/980506166e80a23ed252771080d38f7d18415bfb/src/mappings.ml#L310-L334

It's already in smtml and follows Arthur's implementation

filipeom avatar Feb 13 '25 15:02 filipeom

Awwww, OK. So we could simply use that now? :sweat_smile:

redianthus avatar Feb 13 '25 16:02 redianthus

Awwww, OK. So we could simply use that now? 😅

Yeah I believe so. I think only popcnt is still missing, but I can add a dumb ite expression just to have it in

filipeom avatar Feb 14 '25 08:02 filipeom

OK so actually, clz and ctz are already working, so I'm closing this. popcount is not available in smtml yet but there's already a few opened issues about it

redianthus avatar Feb 15 '25 16:02 redianthus