It take a long time to parse consecutive "[".
It take a long time to parse consecutive "[". For example, like this text. "[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[aa"
Why would anyone write dozen consecutive [? o.O
In actuality, I wrote the log in markdown.
2688 [18/Aug/2013:18:24 2735 [18/Aug/2013:18:25 2934 [18/Aug/2013:18:26 2786 [18/Aug/2013:18:27 3043 [18/Aug/2013:18:28 3361 [18/Aug/2013:18:29 3366 [18/Aug/2013:18:30 3323 [18/Aug/2013:18:31 3496 [18/Aug/2013:18:32 3139 [18/Aug/2013:18:33 3069 [18/Aug/2013:18:34 3071 [18/Aug/2013:18:35 2780 [18/Aug/2013:18:36 2183 [18/Aug/2013:18:37 2016 [18/Aug/2013:18:38 2206 [18/Aug/2013:18:39
Similar problem as https://github.com/sirthias/pegdown/issues/43
Pegdown tries to nest each opening [ as (probably) start of a link after the first [, but cannot find enough closing ] to come to a valid tree. Then it treats the first [ as special character, and repeats the same traversal over and over again for the next opening ['s.
Thank you. I've understand this problem. It will fix?
It is a general problem with the design of the parser. It will trigger with almost any stream of opening characters, eg also with <. Currently, only emphasize and strong (* \ ** \ _ \ __) are handled differently such that it won't loose time re-evaluating unclosed sequences.
In order to handle such repetitive sequences correctly, we need to come up with some generic recovery mechanism, applicable to all sequences with structure: open char, inner sequence, closing char.
It's an interesting problem for peg-parsers in general. Not sure if this type of problems has been debated already, needs investigation.
I see. I really hope there is a good solution.
Hey @Elmervc. We use Pegdown on Atlassian Stash and we just had a customer run in to this problem in production by accident, and it unfortunately requires a restart of the server. I can imagine this might be tricky to solve properly, but would it be possible to sprinkle a few checkForParsingTimeout() methods around to at least put a cap on the parsing?
I'm happy to make the appropriate changes with tests if that's agreeable to you?
Just FTR: The maxParsingTimeInMillis parameter of the Parser class should help in limiting parser runtime on pathological input.
@sirthias We found a particular markdown text that didn't timeout - we have a fairly aggressive limit set. Does that sound right? I can post the text if that helps?
Feel free to extend the code with parsing timeout checks :+1:
And yes, you can post the text here. Maybe I will look into the problem again and combine it with #113 .
Here it is:
how about a new method thats getObjectIdOrAdjustmentGroup? That w[a[[[[[[[[[[[[[[[[[y we're more explicit and still benefit callers from having to do the iff dance
If I get time I'll see where/if any appropriate timeout can help.
Ok, thanks for reporting! I'll happily merge a PR providing a fix for this (and the respective test-case... :). Thanks, guys!
I've push a fix/test for this specific issue. If I had more time (and motivation) I'd like to know if:
a) There's a better place to add these timeout checks to guarantee this doesn't happen for different inputs (eg. What would the cost be for adding checkForParsingTimeout() to most/all Sequence calls)
b) Whether this rule can be tweaked not to timeout at all
Let me know what you think and/or if I should be doing something slightly different. Thanks in advance.