Implement lexer and parser using virtual block tokens
To improve the lexer/parser code and also remove the current 2 spaces limitation we can implement a lexer generating virtual tokens.
test:
wow:
lol
yay
bend
ok
With the input above the lexer should generate the following tokens:
["test", Colon, VirtualBegin, "wow", Colon, VirtualBegin, "lol", "yay", VirtualSemi, "bend", VirtualEnd, "ok", VirtualEnd]
For the parser we can change between fun and imp syntax based on token lookahead or using the old #flavour imp in the middle of the file
I think the easiest and most flexible way would be to lex to this token structure
struct Token {
Char(char),
LineBreak,
Begin,
End,
}
Which keeps the text identical but consumes all whitespaces and converts it into just the indentation information. We could swap between the two behaviours by just flipping a flag in the lexer that tells it whether or not to generate the non-text tokens.
I think with this the conversion of our current parser is pretty trivial since all the text searching stays the same, we just need to change starts_with and consume to use char tokens instead of chars.
I'm not sure what VirtualSemi is in your example.
Here VirtualSemi will be produced between lexemes starting in the same column, the example input can be read as:
test {
wow {
lol yay;
bend
}
ok
}