MatlabCodeAnalyzer Tokenizer: Fix line continuation after punctuation

Prior to this change, any line ending with [punctuation + '...'], for example ||..., would cause the tokenizer to fail.

Fixes #9

Jan 19 '21 17:01 ChristianStadelmann

I know that this is a somewhat ugly hack because it goes back 3 positions. Feel free to reject or improve if you think this is not good.

Jan 19 '21 17:01 ChristianStadelmann

EDIT: I deleted my suggestion since it disregarded the AND operation.

Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.

Jan 19 '21 17:01 HaHeho

In your solution, wouldn't a ||... match the % a binary operator, followed by a unary operator: case and thus parse it as ||.. and .?

Also, even in a simple case like symbol = '.', it would be recognized as % a binary operator, followed by a unary operator and thus adding both an empty token and the .. I think the any in this case is conceptually wrong.

Jan 19 '21 18:01 ChristianStadelmann

In your solution, wouldn't a ||... match the % a binary operator, followed by a unary operator: case and thus parse it as ||.. and .?

Also, even in a simple case like symbol = '.', it would be recognized as % a binary operator, followed by a unary operator and thus adding both an empty token and the .. I think the any in this case is conceptually wrong.

I noticed that my solution did break things, yes.

Also, my version (and I assume yours as well), does not throw a no spaces before operator '...' warning after successfully parsing it. Wouldn't that be supposed to happen? If so, it is a different issue and should be addressed in a separate PR, I assume.

Your version does throw it correctly. Sorry for the for the noise. :)

So your version works entirely as intended then, I suppose? The only question would be if the implementation could be improved.

Jan 19 '21 18:01 HaHeho

Yep, my implementation could definitely be improved. I won't do that today any more, though.

Jan 19 '21 18:01 ChristianStadelmann

I think it is good enough actually, since it allows the following statements to handle it right afterwards.

Just added some documentation and replaced the endsWith() to preserve compatibility with older Matlab versions.

                symbol = skip(punctuation);
                % ends with '...':
                % The '...' has to be unskipped and handled here in order
                % to not cause and error for line endings such as `+...` 
                % or `&&...`.
                if length(symbol) > 3 && strcmp(symbol(end-2:end), '...')
                    pos = pos - 3;
                    symbol = symbol(1:end-3);
                end
                % one operator:

Jan 19 '21 18:01 HaHeho

Actually, this (both your and mine approach) will still break if you write bad but perfectly valid Matlab code like &&..... (note the superfluous dots at the end). I've just created another approach where the tokenizer does not try to parse two operators at once (and jumps between parsing from left to right and from right to left): #13.

Jan 19 '21 22:01 ChristianStadelmann

I've updated this with your suggestions.

Jan 19 '21 22:01 ChristianStadelmann

I've updated this with your suggestions.

Is my preview of the commit wrong after the force-push or did you make a mistake? :)

Jan 20 '21 00:01 HaHeho

Is my preview of the commit wrong after the force-push or did you make a mistake? :)

I did. Forgot to commit+amend before pushing :-P

Now you should see your suggestion in the diff ;-)

Jan 20 '21 08:01 ChristianStadelmann