specification icon indicating copy to clipboard operation
specification copied to clipboard

Clarify glob matching

Open pommicket opened this issue 2 years ago • 7 comments

Taking the specification literally, the glob *.py would not match foo/bar.py since foo/bar includes the slash character. The C library for editorconfig adds **/ to the start of the glob if it does not have a slash in it (see editorconfig.c:260), but this should really be in the spec.

pommicket avatar Oct 18 '23 23:10 pommicket

I agree. However, I don't think this is about *. bar.py will also match foo/bar.py. The issue sounds like not clarifying that if / is present, then it would match the final component of the file path.

xuhdev avatar Oct 26 '23 07:10 xuhdev

One question is: Does bar/baz.py match foo/bar/baz.py? That's something the spec isn't clear about.

xuhdev avatar Oct 26 '23 07:10 xuhdev

It seems our specification is "nearer" to the known behaviour of glob than the c implementation.

The implicit expansion with **/ fixes to the matching of the last part while the original "glob" command would apply the pattern to each part of the path and our spec suggests a behaviour alike (with less words).


https://en.wikipedia.org/wiki/Glob_(programming) https://man7.org/linux/man-pages/man7/glob.7.html

florianb avatar Oct 26 '23 08:10 florianb

For comparision, gitignore's semantics are, in relevant part:

If there is a separator at the beginning or middle (or both) of the pattern, then the pattern is relative to the directory level of the particular .gitignore file itself. Otherwise the pattern may also match at any level below the .gitignore level.

If we followed that, glob bar/baz.py would not apply to file foo/bar/baz.py.

I looked at editorconfig-core-test.

  • glob/ does not appear to have any test cases for this.
  • filetree/path_separator.in appears to have a test for this: nested_path_separator tries to match nested/path/separator against glob path/separator. The expectation is that it does not match.:
    # Tests path separator match below top of path
    new_ec_test(nested_path_separator path_separator.in nested/path/separator "^[ \t\n\r]*$")
    
    given path_separator.in contents:
    [path/separator]
    key1=value1
    

cxw42 avatar Oct 29 '23 19:10 cxw42

@cxw42 Thanks for the awesome summary -- Now looks like what's unclear in the spec is:

Section names in EditorConfig files are filepath globs, similar to the format accepted by .gitignore.

We should make this a bit more formal somehow, but it seems tricky to formulate proper language. I'm thinking maybe it's easier to clarify in the * row after all.

xuhdev avatar Nov 01 '23 07:11 xuhdev

Here's an initial draft --- it's not great, so please respond with improvements :)

 Section names ...  accepted by .gitignore. 
 They support pattern matching through Unix shell-style wildcards. 
+
+A section name that does not contain a `/` matches the last component 
+of the path at any depth below the EditorConfig file.  
+E.g., `[*.py]` matches `foo.py` and `subdir/bar.py`.
+A section name that does contain a `/` matches that section name 
+relative to the directory containing that EditorConfig file.  
+E.g., `[dir/*.py]` matches `dir/foo.py` but not `dir/subdir/foo.py`.
+
 These filepath globs recognize the following as special characters for wildcard matching:

(Separately, I realized we should also add tests and spec stating that a section name may not end with a /. Edit that is now open at https://github.com/editorconfig/editorconfig/issues/493)

cxw42 avatar Nov 05 '23 01:11 cxw42

The draft looks good to me!

xuhdev avatar Nov 09 '23 02:11 xuhdev

I just realized this overlaps quite a bit with https://github.com/editorconfig/editorconfig/issues/509 . @pommicket would you please check and see if the current spec and tests permit this issue to be closed?

cxw42 avatar Nov 24 '24 00:11 cxw42

Yes it looks good now

pommicket avatar Nov 24 '24 18:11 pommicket