Same function with matching name, but with signatures changes, reports as a similarity match.
Not quite sure its a bug/defect, but something I hope can be improved.
Describe the bug
I have an embedded project in which we have heavily refactored a common argument from an unsigned int to an opaque pointer (void *), and changed its order. The project also has a number of very similarly named functions, that vary by only a couple of characters.
The similarity comparison report ends up matching the non-matching function names in addition to the changed signature, resulting in the following two entries in the "Similar" report:
writeReferencePIMU(unsigned int, p_data_t*)
writeReferenceIMU(p_data_t*, void*)
and then again a few lines further down in the report:
writeReferenceIMU(unsigned int, p_data_t*)
writeReferencePIMU(p_data_t*, void*)
By itself this is trivial, but its ends up comparing and reporting the differences of the two functions size deltas; if one function is notably larger than the other, there is a subsequent "gain" on one line, and an equal loss on the other.
To Reproduce Pretty clear from above; make two functions, named similarly with very minor difference. Refactor both functions arguments Generate a report with similarity enabled.
Expected behavior These function show not show up in the similarity report (or if they do, they should be paired together). In the event that a single matching name exists on both sides, it should probably stop attempting to match other "similar" function names. Ideally, an option (perhaps as a default) to first isolate matching function names, regardless of arguments; if there is a match, treat that higher/first, and then look for differences in signature/argument lists. I understand this can get complicated with multiple functions with the same name in both old and new, so I accept that this may be non-trivial.
Maybe another way to describe this approach; if a function name matches in both old and new, but the signature has changed, and there is no other possible candidates with the same name (ie, 1-to-1 match), then treat this pair as a single matching item in the "Similar" report, and do not allow them to be compared with any other symbol/function.
I am familiar with the --similarity_threshold argument, and have been using it, but I find that in doing so, it rejects some things which are significantly dis-similar, but which should still be rightly matched.
Screenshots
Here is another example that illustrates the issue, but differently (both images are from the same report).
By removing these first-ordered matches based on name, with only a single candidate from each old/new binary to match against, it would effectively remove these other errant "similar" matches.
Desktop (please complete the following information):
- OS: Ubuntu 24.04
- Browser Chrome
- Version [e.g. 22]