Z3 python bitvector performance cliff
Hi, I use Z3 via the python bindings and have noticed an interesting performance cliff.
Depending on "THE_VALUE" -- selects how many instructions to translate -- performance looks like this:
THE_VALUE = 400000 # real 0m7,980s stats: {'mov': 288711, 'call': 111283, 'xor': 6}
THE_VALUE = 420000 # real 0m15,706s stats: {'mov': 303147, 'call': 116845, 'xor': 8}
THE_VALUE = 430000 # real 1m21,680s stats: {'mov': 310362, 'call': 119628, 'xor': 10}
THE_VALUE = 440000 # real 4m24,924s stats: {'mov': 317576, 'call': 122414, 'xor': 10}
With all instructions (3 million), z3 ooms around ~60gb and 30 minutes.
The problem occurs when adding the constraint to z3, i.e. s.add(registers["eax"] == 0)
[i] Time to add constraint: 222.20120286941528
Using cvc5, it works fine in basically no time.
Reproduction steps
pip install z3-solver
pip install cvc5
cd $(mktemp -d)
wget https://github.com/Z3Prover/z3/files/13553176/z3_python_bitvector_performance_cliff.tar.gz
tar -xzf z3_python_bitvector_performance_cliff.tar.gz
time python solv.py # be patient!
time python solv_cvc5.py
foobar.cvc5.smt2 contains the smt2 statement of cvc5 and foobar.smt2 the statements of z3.
Note how much bigger the cvc5 one is.
z3_python_bitvector_performance_cliff.tar.gz
solv.py (originally from https://github.com/pwning/public-writeup/tree/master/codegate2023/GateCodeGate):
from z3 import *
with open("./gatecodegate.asm", "r") as f:
code = f.read()
code = (
code.split(" ___isoc99_scanf", 1)[1]
.split("lea rdi, aCorrectCodegat", 1)[0]
.strip()
)
code = code.split("\n")
code = [x.strip() for x in code]
registers = {
x: BitVec(x, 32)
for x in [
"eax",
"ebx",
"ecx",
"edx",
"esi",
"edi",
"[rsp+1C8h+var_4C]",
"[rsp+1C8h+var_50]",
"[rsp+1C8h+var_54]",
"[rsp+1C8h+var_58]",
"[rsp+1C8h+var_5C]",
"[rsp+1C8h+var_60]",
"[rsp+1C8h+var_64]",
"[rsp+1C8h+var_68]",
]
}
gates = {
"gate_and": lambda x, y: x & y,
"gate_or": lambda x, y: x | y,
"gate_xor": lambda x, y: x ^ y,
"gate_not": lambda x, y: ~x,
"gate_shl": lambda x, y: x << y,
"gate_shr": lambda x, y: LShR(x, y),
}
def getval(x, line):
if x in registers:
return registers[x]
elif x.endswith("h"):
return BitVecVal(int(x[:-1], 16), 32)
elif x.isdigit():
return BitVecVal(int(x), 32)
else:
raise Exception(f"Unknown operand: {x}; {line}")
print(f"[i] {len(code)} lines of code")
THE_VALUE = 400000 # real 0m7,980s stats: {'mov': 288711, 'call': 111283, 'xor': 6}
THE_VALUE = 420000 # real 0m15,706s stats: {'mov': 303147, 'call': 116845, 'xor': 8}
THE_VALUE = 430000 # real 1m21,680s stats: {'mov': 310362, 'call': 119628, 'xor': 10}
THE_VALUE = 440000 # real 4m24,924s stats: {'mov': 317576, 'call': 122414, 'xor': 10}
stats = {}
for line_num, line in enumerate(code[0:THE_VALUE]):
line = line.replace(",", "").split()
opcode = line[0]
if opcode == "mov":
stats["mov"] = stats.get("mov", 0) + 1
dst = line[1]
src = line[2]
registers[dst] = getval(src, (line_num, line))
elif opcode == "xor":
stats["xor"] = stats.get("xor", 0) + 1
dst = line[1]
src = line[2]
registers[dst] = registers[dst] ^ getval(src, (line_num, line))
elif opcode == "call":
stats["call"] = stats.get("call", 0) + 1
gate = line[1]
result = gates[gate](registers["edi"], registers["esi"])
registers["eax"] = result
else:
raise Exception(f"Unknown opcode: {opcode}; {line}")
print("[i] Finished parsing code")
s = Solver()
s.set("smtlib2_log", "log.smt2")
import time
before = time.time()
s.add(registers["eax"] == 0)
print(f"[i] Time to add constraint: {time.time() - before}")
# print(registers["eax"])
print(f"THE_VALUE: {THE_VALUE}")
print(f"stats: {stats}")
before = time.time()
open("foobar.smt2", "w").write(s.sexpr())
print(f"[i] Time to write smt2: {time.time() - before}")
exit(0)
print(s.check())
print(s.model())
todo: rewrite your script to C++ to make it easier to use perf-profiler. @nunoplopes may potentially be interested as it is more likely to be about hash-table performance so a way to test replacing the hash-table and/or hash functions.
@intrigus-lgtm Maybe try my latest attempt at changing the hash function: https://github.com/nunoplopes/z3 It improved things for me, at least.
@NikolajBjorner I've rewritten the script to C++. It should be functionally equivalent, but I don't know C++ too well.
There is still a performance cliff:
// THE_VALUE = 400000; // 144ms new-hash: 143ms
// THE_VALUE = 420000; // 7427ms new-hash: 7555ms
// THE_VALUE = 430000; // 69206ms new-hash: 71535ms
// THE_VALUE = 440000; // 228012ms new-hash: 237135ms
(Unpack the tar, then compile solv.cpp and place the resulting executable next to the cleaned.asm file) z3_cpp_bitvector_performance_cliff.tar.gz
@nunoplopes I also tried your attempt at changing the hash function, but it did not help, as can be seen in the above comments.
Is there anything else missing from my side or that I could/should do?
Is there anything else missing from my side or that I could/should do?
Short of financing Z3's development or contributing with PRs or ideas on how to fix this, no.