pegdown It take a long time to parse consecutive "[".

It take a long time to parse consecutive "[". For example, like this text. "[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[aa"

Aug 20 '13 07:08 yasuyuki-baba

Why would anyone write dozen consecutive [? o.O

Aug 20 '13 07:08 jirutka

In actuality, I wrote the log in markdown.

2688 [18/Aug/2013:18:24 2735 [18/Aug/2013:18:25 2934 [18/Aug/2013:18:26 2786 [18/Aug/2013:18:27 3043 [18/Aug/2013:18:28 3361 [18/Aug/2013:18:29 3366 [18/Aug/2013:18:30 3323 [18/Aug/2013:18:31 3496 [18/Aug/2013:18:32 3139 [18/Aug/2013:18:33 3069 [18/Aug/2013:18:34 3071 [18/Aug/2013:18:35 2780 [18/Aug/2013:18:36 2183 [18/Aug/2013:18:37 2016 [18/Aug/2013:18:38 2206 [18/Aug/2013:18:39

Aug 20 '13 08:08 yasuyuki-baba

Similar problem as https://github.com/sirthias/pegdown/issues/43

Pegdown tries to nest each opening [ as (probably) start of a link after the first [, but cannot find enough closing ] to come to a valid tree. Then it treats the first [ as special character, and repeats the same traversal over and over again for the next opening ['s.

Aug 20 '13 08:08 Elmervc

Thank you. I've understand this problem. It will fix?

Aug 21 '13 03:08 yasuyuki-baba

It is a general problem with the design of the parser. It will trigger with almost any stream of opening characters, eg also with <. Currently, only emphasize and strong (* \ ** \ _ \ __) are handled differently such that it won't loose time re-evaluating unclosed sequences.

In order to handle such repetitive sequences correctly, we need to come up with some generic recovery mechanism, applicable to all sequences with structure: open char, inner sequence, closing char.

It's an interesting problem for peg-parsers in general. Not sure if this type of problems has been debated already, needs investigation.

Aug 22 '13 08:08 Elmervc

I see. I really hope there is a good solution.

Aug 23 '13 01:08 yasuyuki-baba

Hey @Elmervc. We use Pegdown on Atlassian Stash and we just had a customer run in to this problem in production by accident, and it unfortunately requires a restart of the server. I can imagine this might be tricky to solve properly, but would it be possible to sprinkle a few checkForParsingTimeout() methods around to at least put a cap on the parsing?

I'm happy to make the appropriate changes with tests if that's agreeable to you?

Mar 01 '14 00:03 charleso

Just FTR: The maxParsingTimeInMillis parameter of the Parser class should help in limiting parser runtime on pathological input.

Mar 03 '14 09:03 sirthias

@sirthias We found a particular markdown text that didn't timeout - we have a fairly aggressive limit set. Does that sound right? I can post the text if that helps?

Mar 03 '14 09:03 charleso

Feel free to extend the code with parsing timeout checks :+1:

And yes, you can post the text here. Maybe I will look into the problem again and combine it with #113 .

Mar 03 '14 09:03 Elmervc

Here it is:

how about a new method thats getObjectIdOrAdjustmentGroup? That w[a[[[[[[[[[[[[[[[[[y we're more explicit and still benefit callers from having to do the iff dance

If I get time I'll see where/if any appropriate timeout can help.

Mar 03 '14 09:03 charleso

Ok, thanks for reporting! I'll happily merge a PR providing a fix for this (and the respective test-case... :). Thanks, guys!

Mar 03 '14 09:03 sirthias

I've push a fix/test for this specific issue. If I had more time (and motivation) I'd like to know if:

a) There's a better place to add these timeout checks to guarantee this doesn't happen for different inputs (eg. What would the cost be for adding checkForParsingTimeout() to most/all Sequence calls) b) Whether this rule can be tweaked not to timeout at all

Let me know what you think and/or if I should be doing something slightly different. Thanks in advance.

Mar 16 '14 03:03 charleso