SafeRe is vulnerable to ReDoS
Step 1: Please describe your environment
- ZeroNet version: 0.7.2 (4555)
Step 2: Describe the problem:
"To avoid the ReDoS algorithmic complexity attack" the function bellow is used to validate user defined regular expressions.
https://github.com/HelloZeroNet/ZeroNet/blob/454c0b2e7e000fda7000cba49027541fbf327b96/src/util/SafeRe.py#L10-L22
This function fails to identify regular expressions that can require exponential time complexity to match user inputs.
Steps to reproduce:
>>> from SafeRe import isSafePattern, match
>>> p = "a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
>>> isSafePattern(p)
True
>>> match(p, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
Observed Results:
match hangs and the execution never completes.
Expected Results:
isSafePattern should properly detect that the pattern is unsafe.
Alternatively, match should use an algorithm with guaranteed linear time complexity to compile and match inputs (e.g. Thompson NFA).
We could replace this by the RE2 (https://github.com/google/re2). There is python bindings available (https://pypi.org/project/google-re2/).
@rllola
Many zites make use of (?!...) and RE2 doesn't seem to support it. (https://github.com/google/re2/wiki/Syntax)
The problem is we neither check for formal allowed regexp syntax, nor have the formal definition at all. Our regexp syntax is implicitly python re syntax.
Not sure if it is possible to move to RE2 in a backward compatible way.
https://github.com/zeronet-enhanced/ZeroNet/commit/2a25d61b968a21aa98c6db2ca9d64f1bbdc54773
In my fork, I (temporarily) fixed this by treating ?s in the same ways as other "repetitions", so the total number of repetition markers cannot exceed 9.
Not sure if it is a proper or a complete solution. I'm not familiar with the ReDoS type of attack and regexp implementation details.