elf_diff icon indicating copy to clipboard operation
elf_diff copied to clipboard

Same function with matching name, but with signatures changes, reports as a similarity match.

Open kylemallory opened this issue 1 year ago • 0 comments

Not quite sure its a bug/defect, but something I hope can be improved.

Describe the bug I have an embedded project in which we have heavily refactored a common argument from an unsigned int to an opaque pointer (void *), and changed its order. The project also has a number of very similarly named functions, that vary by only a couple of characters.

The similarity comparison report ends up matching the non-matching function names in addition to the changed signature, resulting in the following two entries in the "Similar" report:

writeReferencePIMU(unsigned int, p_data_t*)
writeReferenceIMU(p_data_t*, void*)

and then again a few lines further down in the report:

writeReferenceIMU(unsigned int, p_data_t*)
writeReferencePIMU(p_data_t*, void*)

By itself this is trivial, but its ends up comparing and reporting the differences of the two functions size deltas; if one function is notably larger than the other, there is a subsequent "gain" on one line, and an equal loss on the other.

To Reproduce Pretty clear from above; make two functions, named similarly with very minor difference. Refactor both functions arguments Generate a report with similarity enabled.

Expected behavior These function show not show up in the similarity report (or if they do, they should be paired together). In the event that a single matching name exists on both sides, it should probably stop attempting to match other "similar" function names. Ideally, an option (perhaps as a default) to first isolate matching function names, regardless of arguments; if there is a match, treat that higher/first, and then look for differences in signature/argument lists. I understand this can get complicated with multiple functions with the same name in both old and new, so I accept that this may be non-trivial.

Maybe another way to describe this approach; if a function name matches in both old and new, but the signature has changed, and there is no other possible candidates with the same name (ie, 1-to-1 match), then treat this pair as a single matching item in the "Similar" report, and do not allow them to be compared with any other symbol/function.

I am familiar with the --similarity_threshold argument, and have been using it, but I find that in doing so, it rejects some things which are significantly dis-similar, but which should still be rightly matched.

Screenshots image

Here is another example that illustrates the issue, but differently (both images are from the same report). image image

By removing these first-ordered matches based on name, with only a single candidate from each old/new binary to match against, it would effectively remove these other errant "similar" matches.

Desktop (please complete the following information):

  • OS: Ubuntu 24.04
  • Browser Chrome
  • Version [e.g. 22]

kylemallory avatar Sep 14 '24 06:09 kylemallory