binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

References to data variables should be shown as constants when used known constant contexts

Open joelreymont opened this issue 2 years ago • 8 comments

Version and Platform (required):

  • Binary Ninja Version: 3.4.4248-dev, c26685e2
  • OS: macos
  • OS Version: 13.3
  • CPU Architecture: arm64

It looks like BN is producing the wrong IL for movw r2, #0x47e, e.g.

00000bdfa  40f27e42   movw    r2, #0x47e
00000bdfe  86e80300   stm     r6, {r0, r1} {var_a68} {var_a68} {var_a64}  {0x41e5999a}  {0x41dc0000}
00000be02  04f10801   add     r1, r4, #8  {SUB_BDD8_DATA}
00000be06  b2a8       add     r0, sp, #0x2c8 {var_920}
00000be08  bbf08cff   bl      #memcpy

According to IDA, this is

 v6 = memcpy((unsigned int)v176, (char *)SUB_BDD8_DATA, 0x47Eu);

but BN produces HLIL

memcpy(&var_920, &SUB_BDD8_DATA, &data_47e)

This starts from LLIL where it does not recognize 0x47E as an immediate value

40 @ 00000bdfa  r2 = &data_47e

joelreymont avatar Apr 21 '23 13:04 joelreymont

It's being inferred to be a pointer to data - it looks like this is because 0x47e is a valid data address. Is the binary base in IDA also 0?

negasora avatar Apr 21 '23 13:04 negasora

Binary base is zero for both IDA and BN.

On Fri, Apr 21, 2023 at 4:58 PM Josh F @.***> wrote:

It's being inferred to be a pointer to data - it looks like this is because 0x47e is a valid data address. Is the binary base in IDA also 0?

— Reply to this email directly, view it on GitHub https://github.com/Vector35/binaryninja-api/issues/4259#issuecomment-1517877243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAESZ4RNDNXXIPB7FCFDRDXCKG6TANCNFSM6AAAAAAXG4VKM4 . You are receiving this because you authored the thread.Message ID: @.***>

joelreymont avatar Apr 21 '23 14:04 joelreymont

Ok so while it's TECHNICALLY correct, the way we show it is really misleading. We should either simplify instances of &<datavar> to their static value, or never emit that as a reference in the first place

Edit: In some places, it's nice to be able to see a reference to a data variable, like memcpy(&data_100, &data_40, 8) so this shouldn't be applied everywhere. An easy workaround for this in most cases is to rebase the binary to some non-zero value

negasora avatar Apr 21 '23 14:04 negasora

V35 should search for "Encourage Salesman Prompt Delay" to find the database.

joelreymont avatar May 19 '23 06:05 joelreymont

Forgot to add that rebasing binary to a non-zero value may not work as it may break string references.

joelreymont avatar Sep 30 '24 08:09 joelreymont

The only time it would break string references I would expect it to break all data variable references if the binary actually should be based at 0. Are you seeing strings break when rebasing and the binary image isn't supposed to be based at zero? If so, that might be a different bug.

psifertex avatar Sep 30 '24 13:09 psifertex

I have a binary that must be based at address 0x500000 (example) for strings to be referenced. This is because the compiler inserted a bunch of fixed references to 0x50xxxx into the code.

I don't think it matters what the base address is, though. You will break string references as long as there are fixed addresses in the code and you rebase to the address the compiler does not expect.

joelreymont avatar Sep 30 '24 13:09 joelreymont

That's my point though. Only if those fixed locations that require a base address of 0x0 are ones that would cause issue. All the rest, like your example don't require the work around but should already have less issue if they're based correctly.

psifertex avatar Sep 30 '24 14:09 psifertex