[WIP] RAT-473: take global gitignore into account
Take into account exclusions from the global gitignore file if it is present in the default location. It's kind of icky to read environment variables in code called from an enum constructor, but I didn't see a more elegant solution yet either. Relatedly, I have not yet created a unit test (or made sure that a global gitignore will not interfere with a global gitignore being present), hence the WIP status.
https://git-scm.com/docs/gitignore
@raboof would you mind creating a RAT-jira ticket to describe your intentions? Thanks
Do you have a use case where I have something in my global gitignore that is not part of my project? It seems a bit mysterious to me. Thx for a bit more context.
@raboof would you mind creating a RAT-jira ticket to describe your intentions? Thanks
Do you have a use case where I have something in my global gitignore that is not part of my project? It seems a bit mysterious to me. Thx for a bit more context.
Done (RAT-473). Yes, I put some things in there for development tools that I use and that use files in the project directory. That way I don't need to have a perpetually-dirty .gitignore in the project or contribute those unusual special cases to the upstream .gitignore.
@raboof Are some of the things you put in your private ignore file found in other StandardCollection entries? Would it make sense to create more StandardCollection entries for anything that is missing?
The changes that you want to work against are in Pull request #439 Specifically:
- in the
AbstractFileProcessorBuilder.build(final DocumentName root)you want to add a call to a new methodvoid initializeBuilder(final DocumentName root)that you will create inAbstractFileProcessorBuilderas a method that does nothing. - in
GitIgnoreBuilderyou want to implementinitializeBuilder()to perform the file name lookup logic and if a file is found process it using by createing aLevelBuilderfor level-1and callingprotected MatcherSet process(final Consumer<MatcherSet> matcherSetConsumer, final DocumentName root, final DocumentName documentName)to process the file.
See AbstractFileProcessBuilder.checkDirectory() for an example of creating and using the LevelBuilder.
Bascically the system works by creating MatcherSets for each level (.gitignore) file and then applies them in reverse order so that the later ones override the earlier ones as is requried by the gitignore spec. A MatcherSet has two matchers, one that specifies the files that are to be explicitly included (overrides earlier excludes), and one to list files to exclude. The process() method will do that correctly if you have the proper root and documentName specified.
@raboof all code changes are now in master. Please feel free to build against that. There are more changes coming but they are mostly testing and should not impact your changes.
Great, thanks for the pointers! This will likely be on the backburner for a while but I'll return to it eventually :)
Sorry it has taken me soooo long to get to this PR. I'll try to do better going forward.
No worries!
(I understand the problem with the previous implementation. I have mostly applied the suggested changes, but I don't fully understand all parts yet, and would like to have a closer look at both the implementation and the test - hence 'draft' status)
I see in the current implementation indeed includes (i.e. explicitly "not-ignored" patterns) take precedence over excludes (ignores), regardless of where they're specified. Is that really correct for Git? AFAICT a rule in the repo directory should take precedence over a rule in the global gitignore, regardless of whether it's an include or an exclude.
https://git-scm.com/docs/gitignore
The RAT process assumes all files are included by default. So the exclude files do just that list exceptions. Then there is the exception to the exception, where a file is explicitly added back. Once added back we do not support removing them again. This is the basic process for RAT inclusion/exclusion.
In order to provide scoped include/exclude and precedence of more specificity over less specificity the process splits the include/exclude into 2 sets internally. So if we have 3 layers of specificity we have 3 sets of include rules and 3 sets of exclude rules. When we execute the rules we execute the include rules first in order of most specific to least specific set we then execute the exclude rules in the same order. The first rule that fires determines how the files is treated. If no rule fires than we include it (default).
As I write this, I do not have the code in front of me. So the above may be slightly incorrect.
If label the includes as I1- I3 with I3 having the highest precedence, and similarly excludes as E1-E3. I think we execute I3, I2, I1, E3, E2, E1 But it may actually be coded as I3, E3, I2, E2, I1, E1.
Using the decomposition example I provided earlier we should be able to determine how it actually executes. It also should not be too much work to put them in a different order. If we need to change the ordering let's do that with a different issue and PR please.
This weekend I am on holiday with friends and so am away from the computer.
As I write this, I do not have the code in front of me. So the above may be slightly incorrect.
If label the includes as I1- I3 with I3 having the highest precedence, and similarly excludes as E1-E3. I think we execute I3, I2, I1, E3, E2, E1
I think that is a correct description of how RAT currently works. AFAICT that means precedence doesn't have any effect: includes always precede excludes, and the ordering within the includes/excludes doesn't seem observable.
Once added back we do not support removing them again.
That seems like a reasonable limitation.
It also should not be too much work to put them in a different order. If we need to change the ordering let's do that with a different issue and PR please.
OK! Filed https://issues.apache.org/jira/browse/RAT-476
@raboof would you mind adding a changelog entry to describe the newly added feature? thx src/changes/changes.xml
Thank you for your patience and hard work
Thanks for your in-depth review and extensive advice!
hm, during a final check I see behavior that I can't quite place when no global gitignore exists - let me draft this until I find time to convince myself that this is OK :)
hm, during a final check I see behavior that I can't quite place when no global gitignore exists - let me draft this until I find time to convince myself that this is OK :)
OK, the differences I observed are also present between 1.16.1 and the version on the current main branch, so not introduced by this PR.
What is the difference?
LinkedIn: http://www.linkedin.com/in/claudewarren
On Fri 30 May 2025, 16:30 Arnout Engelen, @.***> wrote:
raboof left a comment (apache/creadur-rat#433) https://github.com/apache/creadur-rat/pull/433#issuecomment-2922709945
hm, during a final check I see behavior that I can't quite place when no global gitignore exists - let me draft this until I find time to convince myself that this is OK :)
OK, the differences I observed are also present between 1.16.1 and the version on the current main branch, so not introduced by this PR.
— Reply to this email directly, view it on GitHub https://github.com/apache/creadur-rat/pull/433#issuecomment-2922709945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTVHXBK6WQT3KYWKRXSUL3BB2SLAVCNFSM6AAAAABWK2DV3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMRSG4YDSOJUGU . You are receiving this because you were mentioned.Message ID: @.***>
What is the difference? LinkedIn: http://www.linkedin.com/in/claudewarren
It seems current master detects some files without licenses that the 1.16.1 didn't. At first glance it looks like the main branch is correct, but it might be worth a double-check - I put the list in a separate issue, https://issues.apache.org/jira/browse/RAT-477