tree-sitter-java icon indicating copy to clipboard operation
tree-sitter-java copied to clipboard

Various fixes for parsing string and char literals

Open theawless opened this issue 7 months ago • 0 comments

Fix unicode backslash escaped double quote in string and char literals

Closes https://github.com/tree-sitter/tree-sitter-java/issues/209

As described in JLS: https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.2, the translation for unicode escapes happens before parsing. Hence this is allowed:

Original: '\u005c'' After Unicode escape processing: '\'' Final interpretation: a char literal containing a single quote character

To handle this behaviour I've updated the existing rules such that wherever \\ was accepted \u005c is also accepted.

Allow multiple u chars in unicode escapes within string literals

Closes https://github.com/tree-sitter/tree-sitter-java/issues/207

I feel it's weird but it is mentioned in the JLS: https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.2:~:text=If%20an%20eligible%20%5C%20is%20followed%20by%20u%2C%20or%20more%20than%20one%20u%2C%20and%20the%20last%20u%20is%20not%20followed%20by%20four%20hexadecimal%20digits%2C%20then%20a%20compile%2Dtime%20error%20occurs - there can be multiple u after \u as long as the 4 hex digits come at the end.

theawless avatar Jun 12 '25 04:06 theawless