tree-sitter-java icon indicating copy to clipboard operation
tree-sitter-java copied to clipboard

Parsing escape sequences does not always work

Open ahumenberger opened this issue 2 years ago • 1 comments

The following piece of code currently produces the syntax tree below. Notice that the escape sequence for \s is just swallowed.

I guess the solution is that the allowed characters for escape sequences needs to be extended by characters allowed in regular expressions?!

String s = """
  \n abc
  \s def
""";
program [0, 0] - [4, 0]
  local_variable_declaration [0, 0] - [3, 4]
    type: type_identifier [0, 0] - [0, 6]
    declarator: variable_declarator [0, 7] - [3, 3]
      name: identifier [0, 7] - [0, 8]
      value: string_literal [0, 11] - [3, 3]
        multiline_string_fragment [0, 14] - [1, 2]
        escape_sequence [1, 2] - [1, 4]
        multiline_string_fragment [1, 4] - [2, 2]
        multiline_string_fragment [2, 4] - [3, 0]

ahumenberger avatar Sep 04 '23 10:09 ahumenberger

On a second thought, it's not about regular expressions. It's about allowing all characters from this list here, right? https://docs.oracle.com/javase/specs/jls/se17/html/jls-3.html#jls-3.10.7

But then I'm wondering why \a and \v are allowed as both would yield syntax errors.

_escape_sequence: $ => choice(
      prec(2, token.immediate(seq('\\', /[^abfnrtvxu'\"\\\?]/))),
      prec(1, $.escape_sequence)
    ),

ahumenberger avatar Sep 05 '23 04:09 ahumenberger

Was fixed in https://github.com/tree-sitter/tree-sitter-java/commit/be8ecb62c444d7e63ccabb91be39e8c22cd22673, apologies for forgetting to mention that and close this out

amaanq avatar Dec 20 '24 02:12 amaanq