pants icon indicating copy to clipboard operation
pants copied to clipboard

Support nested .gitignore files in --pants-ignore-use-gitignore

Open stuhood opened this issue 8 years ago • 15 comments

The --pants-ignore option causes pants to ignore files and directories: it uses the same syntax as .gitignore because it is implemented using the ignore crate. The ignore crate also supports consuming .gitignore files from disk, but we're currently not using that support.

This ticket would deal with adding a pants level option next to --pants-ignore that would enable using the .gitignore file in addition to the explicit patterns in --pants-ignore. If the .gitignore file patterns were applied first, this would allow for maximum generality: with that option enabled, --pants-ignore would be a superset of .gitignore, but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

--pants-ignore is declared here: https://github.com/pantsbuild/pants/blob/2d4e61753d3896ee969ba239cb0a66c36d781f84/src/python/pants/option/global_options.py#L127-L131 and ends up being used here: https://github.com/pantsbuild/pants/blob/2d4e61753d3896ee969ba239cb0a66c36d781f84/src/rust/engine/fs/src/lib.rs#L388-L393

stuhood avatar Apr 09 '18 20:04 stuhood

but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

How would this work? Mind writing an example please?

Eric-Arellano avatar Jan 30 '20 19:01 Eric-Arellano

but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

How would this work? Mind writing an example please?

I think that I was thinking...?:

$ cat .gitignore
ignore/these/**/*.py
$ grep -A2 'pants_ignore' pants.ini
pants_ignore: [
    '!ignore/these/but/not/these/*.py',
  ]

Not a big deal though... not sure the usecase is that important.

stuhood avatar Jan 30 '20 23:01 stuhood

Bumping the priority on this. Filesystem specs make it more important than in the past. E.g., ./pants cloc '**' will now see a lot of files it wouldn't previously have, including in our own repo's case large rust build byproducts etc.

benjyw avatar Feb 25 '20 00:02 benjyw

But note that repos can have non top-level .gitignore (this repo does!) and those are evaluated on the relpath from the .gitignore file, so it's more complicated than just tacking .gitignore lines onto the pants ignore list.

For example, we ignore rust build cruft in src/rust/engine/.gitignore, and it's actually quite important to pants-ignore that cruft.

benjyw avatar Feb 25 '20 01:02 benjyw

Or maybe the ignore crate handles multiple nested .gitignore files naturally already?

benjyw avatar Feb 25 '20 01:02 benjyw

If I'm reading this documentation correctly, it looks like the ignore crate supports parsing hierarchically-defined .gitignore files: https://docs.rs/ignore/0.4.11/ignore/struct.WalkBuilder.html#method.parents

gshuflin avatar Feb 27 '20 00:02 gshuflin

Just had a user run into a git-ignored directory that contained BUILD files causing them trouble. This would still be a really good thing to do.

stuhood avatar Mar 16 '20 19:03 stuhood

https://github.com/pantsbuild/pants/pull/9310 gets us part of the way there. We still need to make pants pay attention to all nested .gitignore files, rather than just a top-level one.

gshuflin avatar Apr 10 '20 17:04 gshuflin

One of my users hit this when I added a recursive glob on a data directory.

Mapping targets took several minutes and we couldn't figure out why. After it completed I asked what the line output of list was: 383k files 🙈

thejcannon avatar Apr 15 '22 21:04 thejcannon

I forget why this wasn't easily solved by the ignore crate, but I vaguely remember some difficulty. @gshuflin do you happen to recall?

benjyw avatar Apr 16 '22 09:04 benjyw

It's definitely been a while since I've thought about this - IIRC, the problem was something like, it was difficult to make the code from the rust ignore crate that handled recursive .gitignore files in a filesystem (code that might've lived somewhere around here: https://github.com/pantsbuild/pants/pull/9310/files#diff-a1af714f77a06a94025988e047198428068ac449953a5c0297977b99567e2d63 ) interact with the virtual filesystem abstraction. It expected being able to traverse a real filesystem, and so we would've had to reimplement the logic that ignore had internally in the context of the pants fs module. And we judged this wasn't worth doing at the time since we were able to modify our top-level .gitignore to ignore the nested files we cared about ignoring.

gshuflin avatar Apr 16 '22 19:04 gshuflin

It's not prohibitively hard to glob for the gitignore files from the virtual FS and then construct the ignores by hand, the only blocker is globbing is async and the scheduler is sync. I don't know Rust well enough to bridge that gap

thejcannon avatar Apr 16 '22 19:04 thejcannon

The alternative is looking the gitignores up dynamically as we try and glob paths, and caching the results.

thejcannon avatar Apr 16 '22 20:04 thejcannon

There are some comments on the implementing ticket: https://github.com/pantsbuild/pants/pull/9310#pullrequestreview-377237911

The idea of holding onto ignore state as we descend the tree might be related to some of the optimizations mentioned in #14890, but I haven't thought about this particular problem recently.

stuhood avatar Apr 16 '22 20:04 stuhood

I found this while looking for a feature request for supporting recursive gitignore files. I'm running into this problem too now. So... +1 for doing this :sweat_smile:

jriddy avatar Aug 03 '22 16:08 jriddy

another +1...

fathom-parth avatar Mar 31 '23 23:03 fathom-parth

See https://github.com/pantsbuild/pants/pull/18654 :) Thanks @cognifloyd for talking through the strategy and haha thanks ChatGPT4 for helping to fill out the implementation

Eric-Arellano avatar Apr 02 '23 02:04 Eric-Arellano

This issue has a lot of history on it, which masks the remaining work to be done. I'm going to close it in favor of #19860, which has a more focused description, and some suggested implementation.

stuhood avatar Sep 18 '23 18:09 stuhood