python-pathspec icon indicating copy to clipboard operation
python-pathspec copied to clipboard

Different gitignore pattern interpretations between python-pathspec and Git

Open tomokinakamaru opened this issue 3 months ago • 3 comments

I've noticed that the following four gitignore patterns are interpreted differently between python-pathspec and Git:

pattern path python-pathspec Git
foo**/bar foobar Tracked [^test1] Ignored
foo foo Tracked [^test2] Ignored
[ [ Ignored [^test3] Tracked
[!] [!] Ignored [^test4] Tracked

In my opinion, Git's behavior here is somewhat confusing, but given the wide use of python-pathspec[^note], I wanted to report these differences for clarification. Ideally, I would be happy if the python-pathspec follows Git perfectly, but I understand if it behaves differently against these edge cases.

Are these differences expected or known issues?

[^test1]: https://github.com/tomokinakamaru/python-pathspec/blob/e2677495f64a5b19249184b9f1027adac45cd8b8/tests/test_03_pathspec.py#L57-L63 -> test failure [^test2]: https://github.com/tomokinakamaru/python-pathspec/blob/e2677495f64a5b19249184b9f1027adac45cd8b8/tests/test_03_pathspec.py#L65-L71 -> test failure [^test3]: https://github.com/tomokinakamaru/python-pathspec/blob/e2677495f64a5b19249184b9f1027adac45cd8b8/tests/test_03_pathspec.py#L73-L79 -> test failure [^test4]: https://github.com/tomokinakamaru/python-pathspec/blob/e2677495f64a5b19249184b9f1027adac45cd8b8/tests/test_03_pathspec.py#L81-L87 -> test failure [^note]: I am concerned that cloud-related tools such as awsebcli, which rely on python-pathspec to determine upload inclusion, could inadvertently upload unintended files to a server.

tomokinakamaru avatar Oct 08 '25 01:10 tomokinakamaru

Thanks for the bug report. I've confirmed your results with PathSpec using gitwildmatch, and GitIgnoreSpec. While these are all odd edge cases, I do want to replicate the behavior of git as closely as possible.

Pattern foo**/bar

I never considered this edge case. Currently, pathspec treats it as foo*/bar. The behavior of git is really odd for this. I see no reason why it should match the path foobar. I need to experiment with this kind of pattern more before I can fix it.

$ git check-ignore -v 'foobar'
.gitignore:1:foo**/bar  foobar
>>> spec = PathSpec.from_lines('gitwildmatch', ['foo**/bar'])
>>> files = {'foobar'}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
set()
>>> tracked = files - ignored
>>> print(tracked)
{'foobar'}
>>> spec = GitIgnoreSpec.from_lines(['foo**/bar'])
>>> files = {'foobar'}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
set()
>>> tracked = files - ignored
>>> print(tracked)
{'foobar'}

Pattern foo

Currently, pathspec requires the leading space to be escaped, but git's behavior is reasonable here.

$ git check-ignore -v ' foo'
.gitignore:1: foo        foo
>>> spec = PathSpec.from_lines('gitwildmatch', [' foo'])
>>> files = {' foo'}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
set()
>>> tracked = files - ignored
>>> print(tracked)
{' foo'}
>>> spec = GitIgnoreSpec.from_lines([' foo'])
>>> files = {' foo'}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
set()
>>> tracked = files - ignored
>>> print(tracked)
{' foo'}

Patterns [ and [!]

Currently when pathspec can't parse a [...] range expression, it treats it as the literal characters. Git probably discards it instead, but I need to experiment on this kind of pattern more.

$ git check-ignore -v '['
>>> spec = PathSpec.from_lines('gitwildmatch', ['['])
>>> files = {'['}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
{'['}
>>> tracked = files - ignored
>>> print(tracked)
set()
>>> spec = GitIgnoreSpec.from_lines(['['])
>>> files = {'['}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
{'['}
>>> tracked = files - ignored
>>> print(tracked)
set()
$ git check-ignore -v '[!]'
>>> spec = PathSpec.from_lines('gitwildmatch', ['[!]'])
>>> files = {'[!]'}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
{'[!]'}
>>> tracked = files - ignored
>>> print(tracked)
set()
>>> spec = GitIgnoreSpec.from_lines(['[!]'])
>>> files = {'[!]'}
>>> ignored = set(spec.match_files(files))
>>> print(ignored)
{'[!]'}
>>> tracked = files - ignored
>>> print(tracked)
set()

cpburnz avatar Oct 08 '25 03:10 cpburnz

Thank you for taking the time to look into them! I've reached the same conclusions as you for the last three patterns. Git preserves leading whitespaces and disregards ungrammatical patterns. As for the first pattern, foo**/bar, it seems that Git interprets it as foo*bar, although I find that interpretation unintuitive.

I will report back if I discover anything that helps resolve this issue.

tomokinakamaru avatar Oct 10 '25 06:10 tomokinakamaru

The release notes for Git 2.52.0 indicate the foo**/bar behavior was a bug and will no longer match foobar.

cpburnz avatar Nov 17 '25 02:11 cpburnz

I've released v1.0.0 which fixes the foo bug. Leading whitespace is no longer stripped.

I still need to address [ and [!] more.

cpburnz avatar Jan 06 '26 03:01 cpburnz