HTMLawed icon indicating copy to clipboard operation
HTMLawed copied to clipboard

Regex is slow

Open agulabon11 opened this issue 1 year ago • 1 comments

The preg_match() in line 452 can be slow in PHP 7.4 (not in 8.1 afaict).

Sometimes it takes seconds to minutes to parse.

if (!preg_match('`^(/?)([a-z][^ >]*)([^>]*)>(.*)`sm', $t[$i], $m)) {

For some reason it struggles with a sequence of many short lines (100 lines, 10 characters each), followed by a sequence of very long lines (10 lines, 10k long each). That sequence takes more than 1 minute to parse.

agulabon11 avatar Jul 02 '24 12:07 agulabon11

Test code:

$t="";

for ($i=0; $i<100; $i++)
        $t .= "1234567890\n";

for ($i=0; $i<10; $i++) {
        $l = "";
        for ($j=0; $j<10000; $j++)
                $l .= "x";
        $t .= $l . "\n";
}

if (preg_match('`^(\/?)([a-z][^ >]*)([^>]*)>(.*)`sm', $t, $m)) {
  print("match");
}

It takes 15 seconds for me.

agulabon11 avatar Jul 02 '24 12:07 agulabon11