tree-sitter-bash icon indicating copy to clipboard operation
tree-sitter-bash copied to clipboard

Herestring not recognized _after_ stdout redirect.

Open danfuzz opened this issue 2 years ago • 4 comments

It looks like the parser has trouble with herestrings that come after a redirect of stdout. For example, a parse of the following indicates an error on the here-string token <<<:

cat >x <<< 'x'

Parse tree:

(program [0, 0] - [1, 0]
  (redirected_statement [0, 0] - [0, 14]
    body: (command [0, 0] - [0, 3]
      name: (command_name [0, 0] - [0, 3]
        (word [0, 0] - [0, 3])))
    redirect: (file_redirect [0, 4] - [0, 6]
      destination: (word [0, 5] - [0, 6]))
    (ERROR [0, 7] - [0, 9])
    redirect: (file_redirect [0, 9] - [0, 14]
      destination: (raw_string [0, 11] - [0, 14]))))

Compare that to the following, which works:

cat <<< 'x'

Parse tree:

(program [0, 0] - [1, 0]
  (command [0, 0] - [0, 11]
    name: (command_name [0, 0] - [0, 3]
      (word [0, 0] - [0, 3]))
    redirect: (herestring_redirect [0, 4] - [0, 11]
      (raw_string [0, 8] - [0, 11]))))

danfuzz avatar Oct 31 '23 18:10 danfuzz

I think the root cause here is that the grammar in this project has explicitly different rules for where herestrings are accepted vs. other redirects, whereas the actual shell grammar doesn't make that distinction (other than heredocs, which are of course super special).

danfuzz avatar Oct 31 '23 18:10 danfuzz

My branch https://github.com/danfuzz/tree-sitter-bash/tree/danfuzz-redirects has a proposed fix, though (as I write this) it doesn't produce identical output for the Herestrings test. So, perhaps it's more of a "suggested direction" at this point.

Clarification (after looking more closely): I think my proposed change improves the parse output in most cases of the test, specifically wrapping herestring-redirected statements in (redirected_statement ...) nodes. However, it does fail to parse a case where redirections and arguments are mixed, e.g. (simplified) x <<<x x >x.

danfuzz avatar Oct 31 '23 18:10 danfuzz

Update: I think the failure I noted above highlights a pre-existing bug in the grammar. In particular, it looks like (non-redirection) arguments after a file redirection are taken to be redirections. For example:

x <x a b c

Parse tree:

(program [0, 0] - [1, 0]
  (redirected_statement [0, 0] - [0, 10]
    body: (command [0, 0] - [0, 1]
      name: (command_name [0, 0] - [0, 1]
        (word [0, 0] - [0, 1])))
    redirect: (file_redirect [0, 2] - [0, 10]
      destination: (word [0, 3] - [0, 4])
      destination: (word [0, 5] - [0, 6])
      destination: (word [0, 7] - [0, 8])
      destination: (word [0, 9] - [0, 10]))))

danfuzz avatar Oct 31 '23 19:10 danfuzz

I've fixed the x <x a b c case but the first case in your issue is pretty difficult, throwing herestring_redirect under the 'redirect' field in the repeat1 body fixes it, but makes some parsing output in the tests worse, notably the same case I've fixed. But maybe that's desired, it'd be like this:

image

amaanq avatar Feb 10 '24 08:02 amaanq