edb-debugger icon indicating copy to clipboard operation
edb-debugger copied to clipboard

Unicode characters in symbols are extracted incorrectly

Open 10110111 opened this issue 7 years ago • 2 comments

See the following example C++ program:

double \u00fc() { return 843; }
int main()
{
    \u00fc();
}

We should see ü function being called (and do in GDB). EDB instead says something like ü in its symbol map, and ü in the Disassembly and Analysis views.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

10110111 avatar Feb 27 '18 12:02 10110111

Interesting... Well this is an encoding issue. Do we assume UTF-8? UTF-16? It looks like EDB is perhaps assuming Latin1 encoding in some places.

I don't know if there is a "right answer" here because there is likely nothing to indicate what the appropriate encoding is. What are your thoughts?

eteran avatar Feb 27 '18 18:02 eteran

I suppose we have the following options:

  • Follow system locale (LC_CTYPE I suppose)
  • Assume UTF-8 on UNIX-like platforms (since it's the defacto standard there), UTF-16 on Windows (when we finally support it)

Funnily enough, QtCreator (at least 4.0.3) shows ü in its disassembly view, thus assuming Latin1.

10110111 avatar Feb 27 '18 18:02 10110111