creadur-rat icon indicating copy to clipboard operation
creadur-rat copied to clipboard

[WIP] RAT-473: take global gitignore into account

Open raboof opened this issue 1 year ago • 5 comments

Take into account exclusions from the global gitignore file if it is present in the default location. It's kind of icky to read environment variables in code called from an enum constructor, but I didn't see a more elegant solution yet either. Relatedly, I have not yet created a unit test (or made sure that a global gitignore will not interfere with a global gitignore being present), hence the WIP status.

https://git-scm.com/docs/gitignore

raboof avatar Feb 02 '25 22:02 raboof

@raboof would you mind creating a RAT-jira ticket to describe your intentions? Thanks

Do you have a use case where I have something in my global gitignore that is not part of my project? It seems a bit mysterious to me. Thx for a bit more context.

ottlinger avatar Feb 03 '25 09:02 ottlinger

@raboof would you mind creating a RAT-jira ticket to describe your intentions? Thanks

Do you have a use case where I have something in my global gitignore that is not part of my project? It seems a bit mysterious to me. Thx for a bit more context.

Done (RAT-473). Yes, I put some things in there for development tools that I use and that use files in the project directory. That way I don't need to have a perpetually-dirty .gitignore in the project or contribute those unusual special cases to the upstream .gitignore.

raboof avatar Feb 03 '25 14:02 raboof

@raboof Are some of the things you put in your private ignore file found in other StandardCollection entries? Would it make sense to create more StandardCollection entries for anything that is missing?

The changes that you want to work against are in Pull request #439 Specifically:

  • in the AbstractFileProcessorBuilder.build(final DocumentName root) you want to add a call to a new method void initializeBuilder(final DocumentName root) that you will create in AbstractFileProcessorBuilder as a method that does nothing.
  • in GitIgnoreBuilder you want to implement initializeBuilder() to perform the file name lookup logic and if a file is found process it using by createing a LevelBuilder for level -1 and calling protected MatcherSet process(final Consumer<MatcherSet> matcherSetConsumer, final DocumentName root, final DocumentName documentName) to process the file.

See AbstractFileProcessBuilder.checkDirectory() for an example of creating and using the LevelBuilder.

Bascically the system works by creating MatcherSets for each level (.gitignore) file and then applies them in reverse order so that the later ones override the earlier ones as is requried by the gitignore spec. A MatcherSet has two matchers, one that specifies the files that are to be explicitly included (overrides earlier excludes), and one to list files to exclude. The process() method will do that correctly if you have the proper root and documentName specified.

Claudenw avatar Feb 06 '25 17:02 Claudenw

@raboof all code changes are now in master. Please feel free to build against that. There are more changes coming but they are mostly testing and should not impact your changes.

Claudenw avatar Feb 09 '25 10:02 Claudenw

Great, thanks for the pointers! This will likely be on the backburner for a while but I'll return to it eventually :)

raboof avatar Feb 10 '25 09:02 raboof

Sorry it has taken me soooo long to get to this PR. I'll try to do better going forward.

No worries!

raboof avatar May 06 '25 17:05 raboof

(I understand the problem with the previous implementation. I have mostly applied the suggested changes, but I don't fully understand all parts yet, and would like to have a closer look at both the implementation and the test - hence 'draft' status)

raboof avatar May 20 '25 23:05 raboof

I see in the current implementation indeed includes (i.e. explicitly "not-ignored" patterns) take precedence over excludes (ignores), regardless of where they're specified. Is that really correct for Git? AFAICT a rule in the repo directory should take precedence over a rule in the global gitignore, regardless of whether it's an include or an exclude.

https://git-scm.com/docs/gitignore

raboof avatar May 23 '25 14:05 raboof

The RAT process assumes all files are included by default. So the exclude files do just that list exceptions. Then there is the exception to the exception, where a file is explicitly added back. Once added back we do not support removing them again. This is the basic process for RAT inclusion/exclusion.

In order to provide scoped include/exclude and precedence of more specificity over less specificity the process splits the include/exclude into 2 sets internally. So if we have 3 layers of specificity we have 3 sets of include rules and 3 sets of exclude rules. When we execute the rules we execute the include rules first in order of most specific to least specific set we then execute the exclude rules in the same order. The first rule that fires determines how the files is treated. If no rule fires than we include it (default).

As I write this, I do not have the code in front of me. So the above may be slightly incorrect.

If label the includes as I1- I3 with I3 having the highest precedence, and similarly excludes as E1-E3. I think we execute I3, I2, I1, E3, E2, E1 But it may actually be coded as I3, E3, I2, E2, I1, E1.

Using the decomposition example I provided earlier we should be able to determine how it actually executes. It also should not be too much work to put them in a different order. If we need to change the ordering let's do that with a different issue and PR please.

This weekend I am on holiday with friends and so am away from the computer.

Claudenw avatar May 24 '25 08:05 Claudenw

As I write this, I do not have the code in front of me. So the above may be slightly incorrect.

If label the includes as I1- I3 with I3 having the highest precedence, and similarly excludes as E1-E3. I think we execute I3, I2, I1, E3, E2, E1

I think that is a correct description of how RAT currently works. AFAICT that means precedence doesn't have any effect: includes always precede excludes, and the ordering within the includes/excludes doesn't seem observable.

Once added back we do not support removing them again.

That seems like a reasonable limitation.

It also should not be too much work to put them in a different order. If we need to change the ordering let's do that with a different issue and PR please.

OK! Filed https://issues.apache.org/jira/browse/RAT-476

raboof avatar May 24 '25 09:05 raboof

@raboof would you mind adding a changelog entry to describe the newly added feature? thx src/changes/changes.xml

ottlinger avatar May 28 '25 11:05 ottlinger

Thank you for your patience and hard work

Thanks for your in-depth review and extensive advice!

raboof avatar May 29 '25 07:05 raboof

hm, during a final check I see behavior that I can't quite place when no global gitignore exists - let me draft this until I find time to convince myself that this is OK :)

raboof avatar May 29 '25 07:05 raboof

hm, during a final check I see behavior that I can't quite place when no global gitignore exists - let me draft this until I find time to convince myself that this is OK :)

OK, the differences I observed are also present between 1.16.1 and the version on the current main branch, so not introduced by this PR.

raboof avatar May 30 '25 15:05 raboof

What is the difference?

LinkedIn: http://www.linkedin.com/in/claudewarren

On Fri 30 May 2025, 16:30 Arnout Engelen, @.***> wrote:

raboof left a comment (apache/creadur-rat#433) https://github.com/apache/creadur-rat/pull/433#issuecomment-2922709945

hm, during a final check I see behavior that I can't quite place when no global gitignore exists - let me draft this until I find time to convince myself that this is OK :)

OK, the differences I observed are also present between 1.16.1 and the version on the current main branch, so not introduced by this PR.

— Reply to this email directly, view it on GitHub https://github.com/apache/creadur-rat/pull/433#issuecomment-2922709945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTVHXBK6WQT3KYWKRXSUL3BB2SLAVCNFSM6AAAAABWK2DV3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMRSG4YDSOJUGU . You are receiving this because you were mentioned.Message ID: @.***>

Claudenw avatar May 30 '25 18:05 Claudenw

What is the difference? LinkedIn: http://www.linkedin.com/in/claudewarren

It seems current master detects some files without licenses that the 1.16.1 didn't. At first glance it looks like the main branch is correct, but it might be worth a double-check - I put the list in a separate issue, https://issues.apache.org/jira/browse/RAT-477

raboof avatar May 31 '25 19:05 raboof