ModSecurity Add isolated PCRE match limits as a layer of ReDoS defense

This PR ...

Adds support for PCRE and PCRE2 match limits as a last line of defense against ReDoS. Match limits are applied per PCRE invocation and do not change the global PCRE match limit default, which means the limits are isolated to ModSecurity and shouldn't affect other uses of PCRE in the same program.
Exposes regex errors as the variables RX_ERROR and RX_ERROR_RULE_ID so that regex engine failures can be handled by dedicated rules, rather than silently ignored. Today, if regex execution returns an error, the rx and rxGlobal operators simply fail to match. This could even be considered a security concern since such a regex error will cause a rule to be quietly bypassed.

NOTE: In the future, ModSecurity may support more proactive measures to identify vulnerable regular expressions before they can be used, but there is no harm in keeping multiple layers of defense. Doing so may even be wise, in the same way we design modern buildings to avoid fires yet still employ fire fighters for emergencies.

Background

A similar PR was rejected in the past, and if I understand correctly, it was rejected for the following reasons:

A belief that setting PCRE match limits would effect other PCRE users in the same memory space #
Users were concerned and confused about how global PCRE match limits, set for things like PHP, also affected ModSecurity #
The maintainers desire to avoid entanglement and user confusion with PCRE limits by finding other ways to avoid ReDoS #

If that is correct, I think it is worth revisiting these reasons:

Can we set match limits without affecting other PCRE users in the same memory space? Yes.
Match limits can be applied to individual PCRE pcre_exec() and PCRE2 pcre2_match() invocations without affecting other PCRE uses in the same memory space. Intuitively, if custom settings can be passed to matching functions, those settings should be limited to the function call. But we can also find evidence in the PCRE source code. In the legacy version of PCRE, the pcre_exec() function has a local match_data structure that includes the selected match limit setting. The structure is declared as part of the stack frame here. It is assigned the default match limit value here and, if specified, the custom match limit value here. Since the custom setting is maintained as part of the stack frame, it should have no global effects on other PCRE users or even other subsequent PCRE calls from ModSecurity.
Is ModSecurity affected by changes to the global PCRE default match limit? Today, the answer is yes, but there is a straightforward fix.
ModSecurity can isolate itself from PCRE's default by choosing its own default match limit (even just PCRE's normal default of 10,000,000) and always specifying that limit when invoking pcre_exec() or pcre2_match(). If ModSecurity did this for all PCRE limits instead of relying upon PCRE defaults, ModSecurity users could be isolated from the effects of modified PCRE defaults.
Whether to allow regex engine configuration is up to the ModSecurity maintainers, but some ModSecurity users want this level of configurability.
Eliminating the possibility of ReDoS prior to invoking PCRE or another regex engine is a noble goal, but such a feature does not exist in ModSecurity today. In the meantime, ModSecurity users have real ReDoS concerns that can be addressed through regular expression engine configuration, and such configuration could co-exist peacefully with more proactive anti-ReDoS measures when they are developed.

Implementation

This PR re-enables the SecPcreMatchLimit config directive, saves the match limit as a transaction property, and applies the limit in the rx and rxGlobal operators. This works with both PCRE and the experimental PCRE2 support.

This PR also adds RX_ERROR and RX_ERROR_RULE_ID variables which are set in the event of a regular expression error. RX_ERROR exposes the reason for error and can be set to either "MATCH_LIMIT" or "OTHER". RX_ERROR_RULE_ID exposes the rule ID where the regex error happened so error-handling rules can be written to respond differently based on a specific rule ID or range of rule IDs.

Here is a naive example of rules written against the RX error variables:

# Deny requests when rule IDs in the 2xxx range encounter a regex error
SecRule RX_ERROR_RULE_ID "@beginsWith 2" "id:3330,phase:2,auditlog,deny,msg:'Class 2xxx rule %{RX_ERROR_RULE_ID} encountered regex error %{RX_ERROR}',chain"
		SecRule RX_RULE_ERROR_ID "@eq 4" "t:length"

# Log when other rules encounter a regex error
SecRule RX_ERROR "@gt 0" "t:length,id:3331,phase:2,log,auditlog,allow,msg:'Rule %{RX_ERROR_RULE_ID} encountered regex error %{RX_ERROR}'"

Review

Arguably, regex engine configuration and regex error handling could be split into two PRs, but I'm starting with a single PR for the sake of discussion.

What do you think? Is this a reasonable change? Would you be willing to reconsider allowing users to set ModSecurity-specific PCRE match limits?

May 03 '22 21:05 brandonpayton

Hello @brandonpayton ,

Thanks for the contribution.

While that earlier PR predates my involvement in the project, I do know and agree that concern about limits intended for ModSecurity having effects beyond ModSecurity was indeed a reason to avoid some possible implementations. I don't know that there were any such tangible reasons associated with not considering the per-transaction overrides that you are proposing here.

I have not examined the code in this PR in detail, but I'll make a few observations to start with:

I am open to considering this; as long as the wider side-effects mentioned above can be avoided (which certainly seems to be the case, based on the documentation), this is a reasonable subject to revisit
I think I like your idea of having one or two new error variables available for this condition; without a signal like that available for additional rules to examine, we have a rather uncomfortable scenario where neither 'assume match' nor 'assume no match' can be considered a universally 'safe' option
There is, in principal, an intention to support other regex engines besides pcre/pcre2 (Hyperscan and RE2) and it's always preferable if the same options are supported in the other engines as well. However, even if that is not the case, and those per-transaction limits can only be made to work with pcre/pcre2, I'm not sure that's a good enough reason to reject it

May 09 '22 14:05 martinhsv

Hi @martinhsv, thank you for your feedback on this PR. I'm happy to hear you are open to changes of this nature.

There is, in principal, an intention to support other regex engines besides pcre/pcre2 (Hyperscan and RE2) and it's always preferable if the same options are supported in the other engines as well.

The regex util updates in this PR just add a basic match_limit arg to a couple of the regex utility matching methods. It sounds like that argument may need to be converted into general engine options at some point, but since the regex util is an internal implementation detail, perhaps that change can wait until it is needed.

Is there anything you'd like to see to help move this PR forward? Just a heads up: I am taking a sabbatical for the next few months and may take a while to respond, but I am still very much interested in this change.

Note: I rebased and resolved merge conflicts but haven't had a chance to test that update yet.

Aug 15 '22 04:08 brandonpayton

IMO, it's better to use atomic group or possessive quantifiers for regex rules.

Aug 18 '22 03:08 liudongmiao

IMO, it's better to use atomic group or possessive quantifiers for regex rules.

Hi @liudongmiao, do you mean that it would be better if rule authors wrote better, safer regular expressions? If so, I agree.

The purpose of this PR is to serve as a safety net when loading rules written by others. One example is the OWASP Core Rule Set. We have encountered issues with some of these rules in production. Of course, it would be better to make fixes to individual rules, but to avoid encountering new problems in production, it is valuable to have a safety net like these match limits.

Nov 18 '22 15:11 brandonpayton

Hello @martinhsv,

Would you like me to make any changes to this PR? Is there anything I can do to help this PR or something like it to land?

My employer Automattic is interested in using this regex safety net so we can more comfortably leverage community rule sets like those from OWASP. We have seen some ReDoS issues with open source rule sets and would like to mitigate the risk.

Nov 18 '22 15:11 brandonpayton

Hello @brandonpayton ,

Yes, this is one of the items that I'd hoped to get back to during the coming period. I'll review this within the next few days.

Nov 18 '22 22:11 martinhsv

Hi @brandonpayton ,

There are a few design assumptions here that I think might be worth considering.

The population of your two new variables is 'first wins' -- i.e. they only get set if they have not already been set.

In general ModSecurity has a 'last wins' approach. This is true for variables like MATCHED_VAR and for pcre API matches as well (a repeated group like (a.c)* with input 'abcadc' will return only 'adc' for that capture group). Using 'last wins' would also give the admin slightly more flexibility for collecting information. I.e. it's possible for the admin to either do something with the changing variable periodically, whereas with 'first wins' that additional avenue of inquiry doesn't work.

If we continue with only holding a single value, I would suggest we use 'last wins' instead. Counterarguments are welcome.

But this bleeds into some bigger questions.

I'm somewhat hesitant about the proposed new variable RX_ERROR_RULE_ID.

This is not something ModSecurity has done before. Default rule-specific processing information tends to be communicated through ModSecurity's Debug Log.

I'm not necessarily rejecting the concept of maybe providing this sort of information through a variable. After all, some installations may prefer to run without the debug log engaged, and having some basic information available in such cases may be helpful.

However, if we do wish to pursue that goal, we might wonder if we really want to restrict the admin to only knowing about 1 of possibly multiple failures. Also, we may want to think more broadly: a processing failure that is related to a specific rule may be relevant to other types of operators besides rx. Maybe we should really be considering a new variable in the form of a collection like RULE_ERROR where each key is the rule id where a failure was seen, and the value is a fixed text describing the error.

I'm not sure I've thought this through fully; and implementing an idea like that might be better done separately, as then it would have better visibility with (and potential input from) the community.

As an immediate measure to get your central goal moving forward, I'm wondering if it might be fruitful to limit the near term change to the existing model: set a single signalling variable (in ModSecurity v2, this was called MSC_PCRE_LIMITS_EXCEEDED) and leave additional information communication to the Debug Log.

Thoughts and counterarguments are welcome.

Nov 29 '22 18:11 martinhsv

Hi @martinhsv,

Thank you for your thought and guidance. What you suggested sounds reasonable and wise.

In general ModSecurity has a 'last wins' approach.

It's good to hear about "last wins" versus "first wins", and I do not believe there is any good reason for this to deviate from overall ModSecurity philosophy and practice.

I'm somewhat hesitant about the proposed new variable RX_ERROR_RULE_ID.

This is not something ModSecurity has done before. Default rule-specific processing information tends to be communicated through ModSecurity's Debug Log.

OK, it seems like we should continue favoring the Debug Log, especially for this iteration.

Maybe we should really be considering a new variable in the form of a collection like RULE_ERROR where each key is the rule id where a failure was seen, and the value is a fixed text describing the error.

At the time, I was not sure how to do something like this or whether it was possible. If the community ends up wanting something like this, it sounds good to add in a follow-up patch.

implementing an idea like that might be better done separately, as then it would have better visibility with (and potential input from) the community.

That's a good point about community input. Agreed.

As an immediate measure to get your central goal moving forward, I'm wondering if it might be fruitful to limit the near term change to the existing model: set a single signalling variable (in ModSecurity v2, this was called MSC_PCRE_LIMITS_EXCEEDED) and leave additional information communication to the Debug Log.

Do I understand correctly that we are talking about dropping the RX_ERROR and RX_ERROR_RULE_ID variables for now and adding MSC_PCRE_LIMITS_EXCEEDED as a boolean signaling variable? If so, that sounds great.

Dec 01 '22 16:12 brandonpayton

Because there is also the possibility of other PCRE errors, we may need an MSC_PCRE_ERROR flag to reflect that. Otherwise, those PCRE errors will cause rules to be bypassed, and we cannot write rules to detect and handle it.

Dec 11 '22 00:12 brandonpayton

I've updated this PR to use the simpler flag variables MSC_PCRE_ERRORED and MSC_PCRE_LIMITS_EXCEEDED. The first is set whenever there is any PCRE exec error, and the second is set only when hitting PCRE match limits.

I don't love using multiple flag vars but believe no PCRE exec errors should pass completely silently because it means the rule where the error occurred was effectively ignored. Perhaps, if two flag vars are undesirable, using a single MSC_PCRE_ERRORED flag along with debug logging would be sufficient for now. What do you think?

Dec 12 '22 03:12 brandonpayton

I know that libmodsecurity3 has only one type of message in error log (we can call it as a "regular" message), while mod_security2 has also the "Execution error" type of messages.

I'm sure it would hard to add, but would you consider adding a similar log message (to the Nginx's error.log particularly) if the engine runs into a match limit, as like the mod_security2? (See here and here).

The line seems something like this:

...[client 1.2.3.4] ModSecurity: Rule 7fac7d451d38 [id "-"][file "/usr/share/modsecurity-crs/rules/RESPONSE-951-DATA-LEAKAGES-SQL.conf"][line "345"] - Execution error - PCRE limits exceeded (-8): (null). [hostname  "...

and appears without triggering of any rules. That would be very useful.

Dec 28 '22 21:12 airween

Thanks, @martinhsv. I just pushed some changes to address your review comments. Please let me know if you'd like something adjusted.

Jan 20 '23 01:01 brandonpayton

I know that libmodsecurity3 has only one type of message in error log (we can call it as a "regular" message), while mod_security2 has also the "Execution error" type of messages.

I'm sure it would hard to add, but would you consider adding a similar log message (to the Nginx's error.log particularly) if the engine runs into a match limit, as like the mod_security2? (See here and here).

Hi @airween, thank you for mentioning this. The idea seems pretty reasonable as execution errors are generally undesired and unexpected, but I think it might be better to tackle this in a separate PR.

Mar 08 '23 18:03 brandonpayton

Hi @martinhsv, is there anything left to do under this PR?

Mar 08 '23 19:03 brandonpayton

Hi @martinhsv, thank you for the review.

This does not implement the limit when a regex is in a place like the variable specification (e.g. 'SecRule ARGS:/^parm[0-9]/ "phase:2 ...'. I assume that's on purpose and I think that's reasonable. Regular expressions in that position are very unlikely to cause ReDoS worry.

Honestly, it would be nice to eventually support match limits there wherever regular expressions are used in processing transactions, but the initial focus was on the rx operators.

Do you want to rebase this PR so that it's mergeable?

Done.

We should probably include an automated test for this. If you're not sure how to do this, if you have an example handy (configured limit, regex, input string), I can do so after the fact

Initially, I did not see how to add tests, but with your encouragement, I dug a bit and found the test setup is fairly intuitive. There is now a test for the rx operator, testing the MSC_PCRE_ERROR and MSC_PCRE_LIMITS_EXCEEDED flags.

Unfortunately, I did not have any luck triggering the error condition with a test for the rxGlobal operator, at least with PCRE2, so I have not added a test there yet. It is a bit puzzling but may have something to do with the PCRE2 options used by Regex::searchGlobal().

Apr 11 '23 18:04 brandonpayton

Unfortunately, I did not have any luck triggering the error condition with a test for the rxGlobal operator, at least with PCRE2, so I have not added a test there yet. It is a bit puzzling but may have something to do with the PCRE2 options used by Regex::searchGlobal().

I remember triggering such an error while testing this patch and @rxGlobal manually via ModSecurity-nginx, so I'm not sure where the disconnect is.

Apr 11 '23 18:04 brandonpayton

Hi @martinhsv, regarding the QA check failures in the check-static step, I ran make check-static and found there are some warnings around an unused variables because they are only used with ms_dbg_a().

warning: src/operators/rx.cc,73,style,unreadVariable,Variable 'regex_error_str' is assigned a value that is never used.
warning: src/operators/rx.cc,75,style,unreadVariable,Variable 'regex_error_str' is assigned a value that is never used.
...
warning: src/operators/rx_global.cc,67,style,unreadVariable,Variable 'regex_error_str' is assigned a value that is never used.
warning: src/operators/rx_global.cc,69,style,unreadVariable,Variable 'regex_error_str' is assigned a value that is never used.

It looks like there is a cppcheck suppression entry for rule_with_operator.cc that appears to be for a similar reason (declaration and use with ms_dbg_a). Should I add entries for these to test/cppcheck_suppressions.txt or adjust this in some other way?

May 06 '23 04:05 brandonpayton

Actually, I am not sure whether warnings cause the failure but haven't yet found another cause for the cppcheck failure.

May 06 '23 04:05 brandonpayton

Merged.

There were a couple of things cppcheck was flagging: a pre-existing exclusion whose line number needed adjusting and the couple of (false positive) unreadVariable alerts. I've handled those.

I've also made a small whitespace adjustment from your final revision to the PR, but I think we're all good now.

Thanks for the contribution.

Note to others that this PR incidentally advanced the bison version for generated files to 3.8.2. That had been planned for this spring in any case but was taken care of here.

May 11 '23 15:05 martinhsv

Great! Thanks for all your guidance and patience, @martinhsv. I've learned a lot more about ModSecurity while working on this PR.

May 11 '23 18:05 brandonpayton