Partial floating-point overwrites lead to awkward decompilation
Version and Platform (required):
- Binary Ninja Version: 4.1.5272-dev (3a057a87)
- OS: macOS
- OS Version: 14.5
- CPU Architecture: M1
Bug Description:
On x86-64, most scalar floating point operations only partially overwrite their destination register, but most of the time the original code doesn't care about the previous contents.
For example, this function:
float floats(int cond, float a, float b) {
if (cond) {
return a + b;
} else {
return a * b;
}
}
might be compiled to:
00000000 test edi, edi
00000002 je 0xa
00000004 addss xmm0, xmm1
00000008 jmp 0xe
0000000a mulss xmm0, xmm1
0000000e retn
But the exact same assembly could also correspond to:
__m128 secretly_128(int cond, __m128 a, __m128 b) {
if (cond) {
a[0] += b[0];
} else {
a[0] *= b[0];
}
return a;
}
By default, Binary Ninja decompiles the function to this:
float floats(int32_t arg1, float arg2, float arg3) __pure
if (arg1 == 0)
arg2 = arg2 * arg3
else
arg2 = arg2 + arg3
return arg2
If I change the arguments to __int128, then it explicitly shows the partial overwrite:
int128_t floats(int32_t arg1, int128_t arg2 @ zmm0, int128_t arg3 @ zmm1) __pure
if (arg1 == 0)
arg2.d = arg2.d f* arg3.d
else
arg2.d = arg2.d f+ arg3.d
return arg2
So far, so good! Binary Ninja defaults to guessing that the values are scalars, and based on that guess, it doesn't bother me with what's happening to the upper bits. That seems like a fine default (it's definitely correct ~100% of the time in the code I'm looking at), and if the guess is wrong, I can override the type and get corrected output.
The problem is that for the float version, I'd prefer the output to be:
float result
if (arg1 == 0)
result = arg2 * arg3
else
result = arg2 + arg3
return result
If this is semantically a full overwrite, then there's no reason to use the same variable for the result as for the input.
I actually can get Binary Ninja to give me my desired output, but only by doing Split Variable At Definition twice and then merging three different resulting variables. This is cumbersome and also highly unsafe: if I fail to merge some of the variables, or merge too many variabels, I'll get wrong output.
(See also #5416, where BN thinks there's a partial overwrite when there really isn't.)
Edit: Test binary can be found here.