Add named capturing groups to gleam/regex.Match
As it is now, you have the ability to write named capturing groups, but those results don't appear in the list of matches returned from regex.scan.
import gleam/io
import gleam/regex
pub fn main() {
let assert Ok(regex) = regex.from_string("(?<delim>,)")
let content = "Hello, world!"
regex.scan(regex, content)
|> io.debug
}
[Match(content: ",", submatches: [Some(",")])]
I propose Match have the following signature:
pub type Match {
Match(
/// The full string of the match.
content: String,
/// A `Regex` can have subpatterns, sup-parts that are in parentheses.
submatches: List(#(String, Option(String))),
)
}
So now running something like this
import gleam/io
import gleam/regex
pub fn main() {
let assert Ok(regex) = regex.from_string("(?<type_of>new|old)\\s+(\\w+)")
let content = "new match_type"
regex.scan(regex, content)
|> io.debug
}
would give you this
[Match(content: "new match_type", submatches: [#("type_of", Some("new")), #("2", Some("match_type"))])]
If the concern is this may break existing used regex, possibly a new added option (which will still break existing compiled regex) or a new groups function can be added that returns just a list of tuples of group name and submatches.
How would we handle groups with no names?
named groups can be Some(String) then. if it's named, then there's a string, otherwise Nil. iirc erlang uses PCRE for its regex implementation, so the feature of named strings is technically there. though i'm unsure if the erlang match gives you group names like how it's done with javascript.
here's how javascript handles it
>> /(?<type_of>new|old)\s+(\w+)/.exec("new match_type")
Array(3) [ "new match_type", "new", "match_type" ]
0: "new match_type"
1: "new"
2: "match_type"
groups: Object { type_of: "new" }
type_of: "new"
index: 0
input: "new match_type"
length: 3
<prototype>: Array []
due to how Erlang handles matches though, it may be necessary to instead include desired group names in Options.
1> re:run("new match_type", "(?<type_of>new|old)\\s+(\\w+)").
{match,[{0,14},{0,3},{4,10}]}
2> re:run("new match_type", "(?<type_of>new|old)\\s+(\\w+)", [{capture, ["type_of"], list}]).
{match,["new"]}
I see now gleam/regex is meant to be a port of Elm's Regex package and they unfortunately don't have a way to extract capturing group names either. Straying from this would have to be a decision. 🤷
I see now gleam/regex is meant to be a port of Elm's Regex package
It's not a port, similarity is incidental