jcpp icon indicating copy to clipboard operation
jcpp copied to clipboard

Bad Token's line and column when code line is broken with backslash

Open grzegorz8 opened this issue 12 years ago • 3 comments

When we define a multi-line macro, such as:

6: #define THREE 1 \
7:    + \
8:    2

we could expect that calling token.getLine() for Token representing number "2" would return line 8, but surprisingly the entire define definition is regarded as one-line preprocessor directive, so the result is 6.

The tokens list representing macro THREE:

[HASH@6,0]:"#"
[IDENTIFIER@6,1]:"define"
[IDENTIFIER@6,8]:"A"
[(@6,9]:"("
[IDENTIFIER@6,10]:"a"
[,@6,11]:","
[IDENTIFIER@6,13]:"b"
[)@6,14]:")"
[IDENTIFIER@6,16]:"a"
[WHITESPACE@6,17]:"    "
[+@6,21]:"+"
[WHITESPACE@6,22]:"    "
[IDENTIFIER@6,26]:"b"
[NL@6,27]:"

I'm aware that tokens list in Macro is not public, but still line and column numbers should be correct.

grzegorz8 avatar Feb 01 '14 20:02 grzegorz8

mm, this is presumably due to a weirdness in the cpp spec where backslash-newline is elided and reinserted after the line. We use JoinReader to elide the \ sequences. In order to fix this, it's likely that we will have to merge JoinReader into LexerSource.

Hrrrrnnnng. OK, I accept this as a good bug, but I'll have to think about how to fix it!

shevek avatar Feb 02 '14 20:02 shevek

What's more, if we have a string broken by backslashes into multi-line token, the location of the following tokens is wrong. Example:

4: char *string = "a \
5:     b \
6:     c";

The expected semicolon's location is (6, 8) but actual is (4, 31).

grzegorz8 avatar Mar 02 '14 21:03 grzegorz8

You're quite right. I need to merge JoinReader into LexerSource, but how one knows whether to unget a \ is a little beyond me at this time in the morning. Suggestions taken, else I'll get there soon enough. :-) I appreciate the test cases.

shevek avatar Mar 03 '14 19:03 shevek