Partial floating-point overwrites lead to awkward decompilation

Open comex opened this issue 1 year ago • 0 comments

Version and Platform (required):

Binary Ninja Version: 4.1.5272-dev (3a057a87)
OS: macOS
OS Version: 14.5
CPU Architecture: M1

Bug Description:

On x86-64, most scalar floating point operations only partially overwrite their destination register, but most of the time the original code doesn't care about the previous contents.

For example, this function:

float floats(int cond, float a, float b) {
    if (cond) {
        return a + b;
    } else {
        return a * b;
    }
}

might be compiled to:

00000000  test    edi, edi
00000002  je      0xa

00000004  addss   xmm0, xmm1
00000008  jmp     0xe

0000000a  mulss   xmm0, xmm1

0000000e  retn

But the exact same assembly could also correspond to:

__m128 secretly_128(int cond, __m128 a, __m128 b) {
    if (cond) {
        a[0] += b[0];
    } else {
        a[0] *= b[0];
    }
    return a;
}

By default, Binary Ninja decompiles the function to this:

float floats(int32_t arg1, float arg2, float arg3) __pure

    if (arg1 == 0)
        arg2 = arg2 * arg3
    else
        arg2 = arg2 + arg3
    return arg2

If I change the arguments to __int128, then it explicitly shows the partial overwrite:

int128_t floats(int32_t arg1, int128_t arg2 @ zmm0, int128_t arg3 @ zmm1) __pure

    if (arg1 == 0)
        arg2.d = arg2.d f* arg3.d
    else
        arg2.d = arg2.d f+ arg3.d
    return arg2

So far, so good! Binary Ninja defaults to guessing that the values are scalars, and based on that guess, it doesn't bother me with what's happening to the upper bits. That seems like a fine default (it's definitely correct ~100% of the time in the code I'm looking at), and if the guess is wrong, I can override the type and get corrected output.

The problem is that for the float version, I'd prefer the output to be:

    float result
    if (arg1 == 0)
        result = arg2 * arg3
    else
        result = arg2 + arg3
    return result

If this is semantically a full overwrite, then there's no reason to use the same variable for the result as for the input.

I actually can get Binary Ninja to give me my desired output, but only by doing Split Variable At Definition twice and then merging three different resulting variables. This is cumbersome and also highly unsafe: if I fail to merge some of the variables, or merge too many variabels, I'll get wrong output.

(See also #5416, where BN thinks there's a partial overwrite when there really isn't.)

Edit: Test binary can be found here.

May 14 '24 20:05 comex