Unicode Escapes handled incorrectly

Open Yorwba opened this issue 9 years ago • 0 comments

I do not have TextMate to reproduce this locally, but the grammar is also used by GitHub's syntax highlighting, right?

In this comment thread on Hacker News I learned that Java expands Unicode escapes like \u002F\u002A (for /*) everywhere.

This means that code like

someService.addTrustedKey(SecuritySettings.getTrustedKey("SomeService"));
/* For local debugging:
  // Copy this into your client config: 
  DBG_KEY=\u0022\u0029\u003B\u002F\u002A\u002F
  someService.addTrustedKey("\u0022\u0029\u003B\u002F\u002A\u002F");
*/

is equivalent to

someService.addTrustedKey(SecuritySettings.getTrustedKey("SomeService"));
/* For local debugging:
  // Copy this into your client config: 
  DBG_KEY=");/*/
  someService.addTrustedKey("");/*/");
*/

but is highlighted differently (thanks to HN user mistercow for the example).

I hope it is clear that this difference in highlighting is pretty deceiving when doing code review, for example. A complete fix would require replacing all literal characters in regular expressions (e.g. \*) by something like (\*|\\u002[aA]) and it gets worse for \s etc. This could probably be done by some kind of preprocessor for the syntax, but someone would have to write it.

A simpler temporary fix would be to mark Unicode escapes of ASCII characters as invalid everywhere, to highlight to the unsuspecting that something fishy might be going on.

Sep 01 '16 09:09 Yorwba