Translation::Parser: small lexer incompatibilities with heredocs

Open noahgibbs opened this issue 1 year ago • 0 comments

Here's the sample code I'm starting from:

# parser_gem_lexer_test.rb
require "parser/current"
require "prism"
require "prism/translation/parser"

h = <<'HEREDOC'
  <<~RUBY
    1
  RUBY
HEREDOC

b = Parser::Source::Buffer.new("inline ruby", 1)
b.source = h

_, _, tokens = Parser::CurrentRuby.default_parser.tokenize(b)
puts "Parser:"
pp tokens

_, _, tokens = Prism::Translation::Parser.new.tokenize(b)
puts "Prism:"
pp tokens

If I run this with "ruby -Ilib ./parser_gem_lexer_test.rb" I get similar output, but not quite identical:

Parser:
[[:tSTRING_BEG, ["<<\"", #<Parser::Source::Range inline ruby 2...9>]],
 [:tSTRING_CONTENT, ["1\n", #<Parser::Source::Range inline ruby 10...16>]],
 [:tSTRING_END, ["RUBY", #<Parser::Source::Range inline ruby 16...22>]],
 [:tNL, [nil, #<Parser::Source::Range inline ruby 9...10>]]]
Prism:
[[:tSTRING_BEG, ["<<\"", #<Parser::Source::Range inline ruby 2...9>]],
 [:tSTRING_CONTENT, ["    1\n", #<Parser::Source::Range inline ruby 10...16>]],
 [:tSTRING_END, ["  RUBY\n", #<Parser::Source::Range inline ruby 16...23>]],
 [:tNL, [nil, #<Parser::Source::Range inline ruby 9...10>]]]

Notice the difference between the source locations (16...22 vs 16...23) and string values ("RUBY" vs " RUBY\n").

I figured out a partial fix for the tilde-heredocs here: https://github.com/noahgibbs/prism/tree/parser_heredocs

However, it's only a partial fix -- there's a lot to heredocs, and a number of related lexer incompatibilities. And my solution was a lot of code. Rather than submitting a PR now I'll make a note of the problem here and come back to it later. If we revisit heredocs in the Translation::Parser lexer we'll probably want to do more than just this.

Feb 22 '24 14:02 noahgibbs