validate icon indicating copy to clipboard operation
validate copied to clipboard

As a user, I want to be able to skip files/dirs on file.not_referenced_in_label check

Open rgdeen opened this issue 1 year ago • 4 comments

Checked for duplicates

Yes - I've already checked

🧑‍🔬 User Persona(s)

Anyone running validate

💪 Motivation

The referential integrity check's feature of reporting files that are not referenced in the label is a useful and welcome addition. However, there are times when we intentionally have files that are not officially part of the archive. It would be really useful to be able to specify a list of files and/or directories that are skipped for this check (actually, skipped for ALL checks - so validate pretends they do not exist).

The particular use case motivating this is the MSL hybrid pds3/4 bundle. It has a number of files that are part of the pds3 archive but are not part of pds4. For example, various pds3 boilerplate files, as well as two of the three types of browse products. There are also .XML files in the EXTRAS dir that are not labels. Being able to specify dirs to ignore would prevent these from throwing warnings (or in the XML file case, really serious fatal errors since they're not even PDS4 labels).

Having thousands of such warnings is a problem because it effectively hides any unexpected warnings. As it is the file not referenced warning is useless for the MSL bundle.

Bonus would be wildcard support so we could also skip pds3 .LBL files wherever they occur. (in the MSL case the LBLs are referenced from the pds4 labels, but that's not always the case).

With a number of hybrid bundles due to come out in the very near future (few months), this will become increasingly important.

📖 Additional Details

No response

Acceptance Criteria

Given When I perform Then I expect

⚙️ Engineering Details

No response

🎉 I&T

No response

rgdeen avatar Dec 03 '24 22:12 rgdeen

@rgdeen as an interim solution, would a flag to turn off this check suffice? Or would you prefer to have some provide some explicit exclusions to the run?

jordanpadams avatar Dec 03 '24 22:12 jordanpadams

Well we can always turn off referential integrity checking altogether. But then we lose a lot of useful functionality. A flag to turn off that check could help on an interim basis but I think we really want an exclusion list... that way we keep all functionality. There's also the second (admittedly corner) case where there were non-pds4 XML files which it totally barfed on... the exclusion would cover that too whereas turning off the not-referenced flag would not. And the case where we want to exclude all *.LBL will come up soon... we for certain have hybrid bundles in development that do not point to the pods label, which makes the case for a wildcard too (although this did not come up in the MSL case).

Summary: Directory exclusion - ignore things like EXTRAS dirs (I don't want to ignore all of EXTRAS, just some of the subdirs) or CATALOG Wildcard-based file exclusion - ignore things like *.LBL or pds3 voldesc.cat (single file is degenerative case of wildcard match)

rgdeen avatar Dec 03 '24 23:12 rgdeen

There are a number of reasons why a node might want to place non-PDS4 files into a directory alongside a PDS4 archive. Since PDS4 is agnostic as to file systems, this is totally allowed. I also agree that it's great for Validate to check for non-PDS4 files in the directory, but something like a .gitignore file would seem to be a good practice.

matthewtiscareno avatar Dec 03 '24 23:12 matthewtiscareno

I will add desire to also have this feature. I submitted a ticket #1252 (and closed it) before finding this ticket after a deeper search giving my input and use cases.

I would like to see an --exclude files/directory option and perhaps an --exclude-list option that points to a list of files/directories to exclude.

tbarnes4 avatar May 21 '25 13:05 tbarnes4