stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

Add named capturing groups to gleam/regex.Match

Open yoonthegoon opened this issue 1 year ago • 6 comments

As it is now, you have the ability to write named capturing groups, but those results don't appear in the list of matches returned from regex.scan.

import gleam/io
import gleam/regex

pub fn main() {
  let assert Ok(regex) = regex.from_string("(?<delim>,)")
  let content = "Hello, world!"

  regex.scan(regex, content)
  |> io.debug
}
[Match(content: ",", submatches: [Some(",")])]

I propose Match have the following signature:

pub type Match {
  Match(
    /// The full string of the match.
    content: String,
    /// A `Regex` can have subpatterns, sup-parts that are in parentheses.
    submatches: List(#(String, Option(String))),
  )
}

So now running something like this

import gleam/io
import gleam/regex

pub fn main() {
  let assert Ok(regex) = regex.from_string("(?<type_of>new|old)\\s+(\\w+)")
  let content = "new match_type"

  regex.scan(regex, content)
  |> io.debug
}

would give you this

[Match(content: "new match_type", submatches: [#("type_of", Some("new")), #("2", Some("match_type"))])]

If the concern is this may break existing used regex, possibly a new added option (which will still break existing compiled regex) or a new groups function can be added that returns just a list of tuples of group name and submatches.

yoonthegoon avatar Jul 21 '24 18:07 yoonthegoon

How would we handle groups with no names?

lpil avatar Jul 22 '24 12:07 lpil

named groups can be Some(String) then. if it's named, then there's a string, otherwise Nil. iirc erlang uses PCRE for its regex implementation, so the feature of named strings is technically there. though i'm unsure if the erlang match gives you group names like how it's done with javascript.

yoonthegoon avatar Jul 22 '24 14:07 yoonthegoon

here's how javascript handles it

>> /(?<type_of>new|old)\s+(\w+)/.exec("new match_type")
Array(3) [ "new match_type", "new", "match_type" ]
  0: "new match_type"
  1: "new"
  2: "match_type"
  groups: Object { type_of: "new" }
    type_of: "new"
  index: 0
  input: "new match_type"
  length: 3
  <prototype>: Array []

yoonthegoon avatar Jul 22 '24 14:07 yoonthegoon

due to how Erlang handles matches though, it may be necessary to instead include desired group names in Options.

1> re:run("new match_type", "(?<type_of>new|old)\\s+(\\w+)").
{match,[{0,14},{0,3},{4,10}]}
2> re:run("new match_type", "(?<type_of>new|old)\\s+(\\w+)", [{capture, ["type_of"], list}]).
{match,["new"]}

yoonthegoon avatar Jul 22 '24 14:07 yoonthegoon

I see now gleam/regex is meant to be a port of Elm's Regex package and they unfortunately don't have a way to extract capturing group names either. Straying from this would have to be a decision. 🤷

yoonthegoon avatar Jul 22 '24 14:07 yoonthegoon

I see now gleam/regex is meant to be a port of Elm's Regex package

It's not a port, similarity is incidental

lpil avatar Jul 25 '24 10:07 lpil