Bad Token's line and column when code line is broken with backslash
When we define a multi-line macro, such as:
6: #define THREE 1 \
7: + \
8: 2
we could expect that calling token.getLine() for Token representing number "2" would return line 8, but surprisingly the entire define definition is regarded as one-line preprocessor directive, so the result is 6.
The tokens list representing macro THREE:
[HASH@6,0]:"#"
[IDENTIFIER@6,1]:"define"
[IDENTIFIER@6,8]:"A"
[(@6,9]:"("
[IDENTIFIER@6,10]:"a"
[,@6,11]:","
[IDENTIFIER@6,13]:"b"
[)@6,14]:")"
[IDENTIFIER@6,16]:"a"
[WHITESPACE@6,17]:" "
[+@6,21]:"+"
[WHITESPACE@6,22]:" "
[IDENTIFIER@6,26]:"b"
[NL@6,27]:"
I'm aware that tokens list in Macro is not public, but still line and column numbers should be correct.
mm, this is presumably due to a weirdness in the cpp spec where backslash-newline is elided and reinserted after the line. We use JoinReader to elide the \ sequences. In order to fix this, it's likely that we will have to merge JoinReader into LexerSource.
Hrrrrnnnng. OK, I accept this as a good bug, but I'll have to think about how to fix it!
What's more, if we have a string broken by backslashes into multi-line token, the location of the following tokens is wrong. Example:
4: char *string = "a \
5: b \
6: c";
The expected semicolon's location is (6, 8) but actual is (4, 31).
You're quite right. I need to merge JoinReader into LexerSource, but how one knows whether to unget a \ is a little beyond me at this time in the morning. Suggestions taken, else I'll get there soon enough. :-) I appreciate the test cases.