Tokenizer: Fix line continuation after punctuation
Prior to this change, any line ending with [punctuation + '...'], for
example ||..., would cause the tokenizer to fail.
Fixes #9
I know that this is a somewhat ugly hack because it goes back 3 positions. Feel free to reject or improve if you think this is not good.
EDIT: I deleted my suggestion since it disregarded the AND operation.
Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.
In your solution, wouldn't a ||... match the % a binary operator, followed by a unary operator: case and thus parse it as ||.. and .?
Also, even in a simple case like symbol = '.', it would be recognized as % a binary operator, followed by a unary operator and thus adding both an empty token and the .. I think the any in this case is conceptually wrong.
In your solution, wouldn't a
||...match the% a binary operator, followed by a unary operator:case and thus parse it as||..and.?Also, even in a simple case like
symbol = '.', it would be recognized as% a binary operator, followed by a unary operatorand thus adding both an empty token and the.. I think theanyin this case is conceptually wrong.
I noticed that my solution did break things, yes.
Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.
Your version does throw it correctly. Sorry for the for the noise. :)
So your version works entirely as intended then, I suppose? The only question would be if the implementation could be improved.
Yep, my implementation could definitely be improved. I won't do that today any more, though.
I think it is good enough actually, since it allows the following statements to handle it right afterwards.
Just added some documentation and replaced the endsWith() to preserve compatibility with older Matlab versions.
symbol = skip(punctuation);
% ends with '...':
% The '...' has to be unskipped and handled here in order
% to not cause and error for line endings such as `+...`
% or `&&...`.
if length(symbol) > 3 && strcmp(symbol(end-2:end), '...')
pos = pos - 3;
symbol = symbol(1:end-3);
end
% one operator:
Actually, this (both your and mine approach) will still break if you write bad but perfectly valid Matlab code like &&..... (note the superfluous dots at the end). I've just created another approach where the tokenizer does not try to parse two operators at once (and jumps between parsing from left to right and from right to left): #13.
I've updated this with your suggestions.
I've updated this with your suggestions.
Is my preview of the commit wrong after the force-push or did you make a mistake? :)
Is my preview of the commit wrong after the force-push or did you make a mistake? :)
I did. Forgot to commit+amend before pushing :-P
Now you should see your suggestion in the diff ;-)