phobos icon indicating copy to clipboard operation
phobos copied to clipboard

Std.regex: incorrect values of look-around captures

Open Viorel opened this issue 11 months ago • 1 comments

The next example, checked with dmd 2.109.1, uses a combination of lookbehind and lookahead assertions:

import std.stdio;
import std.regex;

void main()
{
    auto re = regex(r"(?<=(..)(?=(..)))..cde");
    auto captures = std.regex.matchFirst("12345abcde", re);
    writeln(captures[0]); // "abcde" as expected
    writeln(captures[1]); // "45" as expected
    writeln(captures[2]); // nothing, but "ab" is expected
}

The value of captures[2] should be “ab”, but it is null. (Other prominent engines, in various languages, give correct results).

According to documentation, the std.regex library should support “arbitrary length and complexity lookbehind, including lookahead in lookbehind and vice-versa”.

The modified patterns, such as (?<=(..))(?=(..))..cde, seem to work correctly.

Viorel avatar Mar 05 '25 19:03 Viorel

Verified with regex101, we’re apparently the outlier here.

0xEAB avatar Mar 08 '25 18:03 0xEAB