Unicode characters in symbols are extracted incorrectly
See the following example C++ program:
double \u00fc() { return 843; }
int main()
{
\u00fc();
}
We should see ü function being called (and do in GDB). EDB instead says something like ü in its symbol map, and ü in the Disassembly and Analysis views.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Interesting... Well this is an encoding issue. Do we assume UTF-8? UTF-16? It looks like EDB is perhaps assuming Latin1 encoding in some places.
I don't know if there is a "right answer" here because there is likely nothing to indicate what the appropriate encoding is. What are your thoughts?
I suppose we have the following options:
- Follow system locale (
LC_CTYPEI suppose) - Assume UTF-8 on UNIX-like platforms (since it's the defacto standard there), UTF-16 on Windows (when we finally support it)
Funnily enough, QtCreator (at least 4.0.3) shows ü in its disassembly view, thus assuming Latin1.