Translation::Parser: small lexer incompatibilities with heredocs
Here's the sample code I'm starting from:
# parser_gem_lexer_test.rb
require "parser/current"
require "prism"
require "prism/translation/parser"
h = <<'HEREDOC'
<<~RUBY
1
RUBY
HEREDOC
b = Parser::Source::Buffer.new("inline ruby", 1)
b.source = h
_, _, tokens = Parser::CurrentRuby.default_parser.tokenize(b)
puts "Parser:"
pp tokens
_, _, tokens = Prism::Translation::Parser.new.tokenize(b)
puts "Prism:"
pp tokens
If I run this with "ruby -Ilib ./parser_gem_lexer_test.rb" I get similar output, but not quite identical:
Parser:
[[:tSTRING_BEG, ["<<\"", #<Parser::Source::Range inline ruby 2...9>]],
[:tSTRING_CONTENT, ["1\n", #<Parser::Source::Range inline ruby 10...16>]],
[:tSTRING_END, ["RUBY", #<Parser::Source::Range inline ruby 16...22>]],
[:tNL, [nil, #<Parser::Source::Range inline ruby 9...10>]]]
Prism:
[[:tSTRING_BEG, ["<<\"", #<Parser::Source::Range inline ruby 2...9>]],
[:tSTRING_CONTENT, [" 1\n", #<Parser::Source::Range inline ruby 10...16>]],
[:tSTRING_END, [" RUBY\n", #<Parser::Source::Range inline ruby 16...23>]],
[:tNL, [nil, #<Parser::Source::Range inline ruby 9...10>]]]
Notice the difference between the source locations (16...22 vs 16...23) and string values ("RUBY" vs " RUBY\n").
I figured out a partial fix for the tilde-heredocs here: https://github.com/noahgibbs/prism/tree/parser_heredocs
However, it's only a partial fix -- there's a lot to heredocs, and a number of related lexer incompatibilities. And my solution was a lot of code. Rather than submitting a PR now I'll make a note of the problem here and come back to it later. If we revisit heredocs in the Translation::Parser lexer we'll probably want to do more than just this.