Bend icon indicating copy to clipboard operation
Bend copied to clipboard

Implement lexer and parser using virtual block tokens

Open imaqtkatt opened this issue 1 year ago • 2 comments

To improve the lexer/parser code and also remove the current 2 spaces limitation we can implement a lexer generating virtual tokens.

test:
  wow:
    lol
         yay
    bend
  ok

With the input above the lexer should generate the following tokens:

["test", Colon, VirtualBegin, "wow", Colon, VirtualBegin, "lol", "yay", VirtualSemi, "bend", VirtualEnd, "ok", VirtualEnd]

For the parser we can change between fun and imp syntax based on token lookahead or using the old #flavour imp in the middle of the file

imaqtkatt avatar Jul 16 '24 22:07 imaqtkatt

I think the easiest and most flexible way would be to lex to this token structure

struct Token {
  Char(char),
  LineBreak,
  Begin,
  End,
}

Which keeps the text identical but consumes all whitespaces and converts it into just the indentation information. We could swap between the two behaviours by just flipping a flag in the lexer that tells it whether or not to generate the non-text tokens.

I think with this the conversion of our current parser is pretty trivial since all the text searching stays the same, we just need to change starts_with and consume to use char tokens instead of chars.

I'm not sure what VirtualSemi is in your example.

developedby avatar Jul 17 '24 07:07 developedby

Here VirtualSemi will be produced between lexemes starting in the same column, the example input can be read as:

test {
  wow {
    lol yay;
    bend
  }
  ok
}

imaqtkatt avatar Jul 17 '24 11:07 imaqtkatt