santa icon indicating copy to clipboard operation
santa copied to clipboard

Switch regexes from NSRegularExpression to RE2

Open russellhancox opened this issue 10 years ago • 1 comments

RE2 is faster than ICU (used by NSRegularExpression) for most types of expression at both matching and compiling.

It would possibly be prudent to benchmark some example regexes that are likely to be used in Santa (generally short input text, not recompiled regularly, lots of use of OR) before implementing but given the benchmarks already done I think it very unlikely ICU will come out ahead on average.

For implementation: I think a very simple ObjC wrapper around RE2 would make switching and memory management easier. It would also limit the exposure to Objective-C++ to just that wrapper class, which is nice.

The biggest issue is getting RE2 into the build process: the other dependencies are installed with CocoaPods and a Pod for RE2 is not available.

russellhancox avatar Oct 14 '15 19:10 russellhancox

This might not be as easy as I expected as RE2 doesn't allow negative-lookahead, which is very useful, especially for the FileChangesRegex key.

Negative-lookahead can have terrible performance so it might be worth adding a way to specify the FileChangesRegex key is negative instead.

russellhancox avatar Feb 29 '16 17:02 russellhancox

This isn't going to happen, negative-lookahead is too useful to give up. There are alternatives using prefix trees with negative policies but regexes will continue to be supported and so we cannot lose this useful ability.

russellhancox avatar Dec 13 '22 16:12 russellhancox