vscode-textmate icon indicating copy to clipboard operation
vscode-textmate copied to clipboard

Can't define recursive regex match expressions

Open Hendiadyoin1 opened this issue 3 years ago • 3 comments

You cant define recursive regex patterns to match, this would be useful when doing something, that might need matched parenthesis and you are already using a begin/end or this method is not applicable.

Example

{
    "$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
    "scopeName": "source.test",
    "name": "A test",
    "patterns": [
        {
            "include": "#my-recursive-using-thing"
        }
    ],
    "repository": {
        "my-recursive-using-thing": {
            "match": "(\\((?:[^()]+|(?1))\\s+or\\s+(?:[^()]+|(?1))\\))\\s+([_-]?[A-Za-z][0-9A-Z_a-z-]*)(;)",
            "captures": {
                "1": {
                    "patterns": [
                        {
                            "include": "#my-recursive-thing"
                        }
                    ]
                },
                "2": {
                    "name": "entity.name"
                }
            }
        },
        "my-recursive-thing": {
            "match": "(\\(([^()]+|(?1))\\s+(or)\\s+((?1)|[^()]+)\\))",
            "captures": {
                "1": {
                    "patterns": [
                        {
                            "include": "#my-recursive-thing"
                        },
                        {
                            "match": "[_-]?[A-Za-z][0-9A-Z_a-z-]*",
                            "name": "storage.type"
                        }
                    ]
                },
                "2": {
                    "name": "keyword.operator.or"
                },
                "3": {
                    "patterns": [
                        {
                            "include": "#my-recursive-thing"
                        },
                        {
                            "match": "[_-]?[A-Za-z][0-9A-Z_a-z-]*",
                            "name": "storage.type"
                        }
                    ]
                }
            }
        }
    }
}

should match both:

(foo or bar) one_test;
(foo or (another_foo or bar)) test;

This uses the PCRE recursion feature, which is currently not supported and cannot easily written as a begin/end matcher.

Solutions

Use PCRE(2) like mainline vs-code for lookups Additional Benefits:

  • You can possibly allow predefinitions of often used templates to avoid unnecessary repetition, like in this example [_-]?[A-Za-z][0-9A-Z_a-z-]* could be predefined with a name and all further references could just use (?&name) to use them

Hendiadyoin1 avatar Jun 06 '22 14:06 Hendiadyoin1

If my memory is correct, recursive back references should work. We've done balanced parentheses using this in the C++ syntax. (I'm not a vs code contributor btw)

It's not PCRE though, it's Oniguruma. https://macromates.com/manual/en/regular_expressions

So the (?1) shouldn't work as a subexpression or a backreference. The syntax for subexpression is \g<1> but I don't remember testing it in VS Code.

The subexpression might be screwing up the whole execution, making it look like the backref doesn't work.

jeff-hykin avatar Jun 08 '22 10:06 jeff-hykin

I'll try that then, did not know of the onigoruma dialect.

Hendiadyoin1 avatar Jun 08 '22 12:06 Hendiadyoin1

(?1) will cause an error and silently fail image

Even after replacing (?1) with \\g<1>, it seems that example enters catastrophic back tracking and fails completely

List of all Oniguruma's expressions can be found in the doc section on its github https://github.com/kkos/oniguruma/blob/master/doc/RE

RedCMD avatar Jun 09 '22 01:06 RedCMD

I think this issue can be closed

jeff-hykin avatar Jun 03 '23 12:06 jeff-hykin