python-pathspec icon indicating copy to clipboard operation
python-pathspec copied to clipboard

`match_files()` is not a pure generator function, and it impacts `tree_*()` gravely

Open orens opened this issue 4 years ago • 0 comments

Hey @cpburnz , thanks for the great lib! In match_files() (https://github.com/cpburnz/python-path-specification/blob/c00b332b2075548ee0c0673b72d7f2570d12ffe6/pathspec/pathspec.py#L170), the line

file_map = util.normalize_files(files, separators=separators)

(L190) requires files to be completely exhausted before even the first file is matched. If files is a list-like, this is not a problem, but when calling it from the tree_*() methods it means that the whole iterator mechanics is pretty much useless. It also means that if I have an ignored folder containing a very complex structure, which I want pathspec to ignore, pathspec will search through it although there is no way it will play a role in the results.

As an example, for an automation I'm writing on a real life repository containing a frontend application, the scan of npm generated files took about 10 minutes (before yielding the first result) and then I gave up and stopped it.

I think a possible solution is to remove this dictionary and simply doing:

for file in files:
  if util.match_file(self.patterns, util.normalize_file(file)):
    yield file

(I bypassed util.match_files() here as it, too, is not a generator and will try to convert files to list first)

orens avatar Oct 19 '21 11:10 orens