validate icon indicating copy to clipboard operation
validate copied to clipboard

As a user, I want to have the ability to skip directories in a validation run

Open ralanis-jpl opened this issue 10 months ago • 15 comments

Checked for duplicates

No - I haven't checked

🧑‍🔬 User Persona(s)

Data Engineers and Data Providers

💪 Motivation

...so that I can produce Validate reports without WARNINGs regarding the absence of PDS4 labels for files/directories that are not intended to have them.

📖 Additional Details

For example, MSL archive bundles meet both PDS3 and PDS4 standards by containing the essential files needed to pass both validations. However, in the (PDS3) /EXTRAS directory there are three sizes of browse files, /FULL, /BROWSE, and /THUMBNAIL. Only the /BROWSE directory gets PDS4 xml labels added. The /FULL and /THUMBNAIL do not, yet Validate lists WARNINGs for every image within these two directories, thus needlessly extending the length of the validation report. By being able to omit specific directories that are not of interest, it would save time and needless reporting of files that are not listed in any collection_*.csv file.

Acceptance Criteria

Given When I perform Then I expect

⚙️ Engineering Details

No response

🎉 I&T

No response

ralanis-jpl avatar Mar 31 '25 19:03 ralanis-jpl

Thanks @ralanis-jpl we will add this to the backlog. Is this a blocker to anything you are trying to do? Or is this more of an inconvenience in triaging a validate run?

jordanpadams avatar Mar 31 '25 20:03 jordanpadams

It's not a blocker, but it has been noticeable as we're folding in more, still active, archives into the PDS3 to PDS4 migration. As you said, it would help with analyzing validation runs. Thanks.

ralanis-jpl avatar Mar 31 '25 21:03 ralanis-jpl

@jordanpadams @ralanis-jpl

Would it be possible to have the full content in one tree then have a script build a PDS4 tree that linked back to the full tree omitting the unwanted directories? You could then validate the PDS4 tree, and it would behave as you want. The script would also give you much better control of omissions that you may want later.

al-niessner avatar Apr 03 '25 17:04 al-niessner

@ralanis-jpl Based upon all the priorities we have in our backlog, unfortunately, I going to need to keep this in the icebox for the time being.

As @al-niessner noted, I would recommend doing a find/tree on the file system, filter out what you want validate to look at, and either feed that in as a manifest or via the CLI as a list of targets. When running the final bundle validation, you will need to just let it run, but that will hopefully be a last stop.

jordanpadams avatar Apr 04 '25 00:04 jordanpadams

Let us know if this becomes a blocker for running validate, and we will reevaluate.

jordanpadams avatar Apr 04 '25 00:04 jordanpadams

For the Validate option of "-t", does one specify a directory only with its name or is the path also required? For example,

$ ./validate MSLNAV_1XXX -R pds4.bundle -D -t bundle.xml DATA EXTRAS/BROWSE -v 2 -r MSLNAV_1XXX_rpt_01.txt

(DATA/, EXTRAS/ and bundle.xml all sit at the same top level)

Thanks

ralanis-jpl avatar Apr 16 '25 20:04 ralanis-jpl

@ralanis-jpl

I am not expert on this so @jordanpadams may have to correct me.

The -R pds4.bundle tells us that the objects pointed to by the -t, in your case is bundle.xml, is bundle and to use all knowledge about bundles to process it; meaning, use it as the root location and search all directories it points to etc.

If the bundle is in the directory MSLNAV_1XXX, then it should be -t MSLNAV_1XXX/bundle.xml.

As written, validate thinks there are 4 bundles: MSLNAV_1XXX, bundle.xml, DATA, and EXTRAS/BROWSE.

If all you want to do is process the bundle: ./validate -D -v2 -r MSLNAV_1XXX_rpt_01.txt -R pds4.bundle -t MSLNAV_1XXX/bundle.xml

al-niessner avatar Apr 16 '25 21:04 al-niessner

or rather, $ ./validate -D -v 2 -r MSLNAV_1XXX_rpt_01.txt -R pds4.bundle -t MSLNAV_1XXX/bundle.xml MSLNAV_1XXX/DATA MSLNAV_1XXX/EXTRAS/BROWSE

ralanis-jpl avatar Apr 16 '25 22:04 ralanis-jpl

If you want it to process directories rather than bundles,

./validate -D -v 2 -r MSLNAV_1XXX_rpt_01.txt -R pds4.directory -t MSLNAV_1XXX MSLNAV_1XXX/DATA MSLNAV_1XXX/EXTRAS/BROWSE

It will not do what you want. It will walk EVERY directory in MSLNAV_1XXX.

When you give it more than one target, it is the union of those targets. Also, all targets should be of the same type that matches the -R value (or label when not specified).

al-niessner avatar Apr 16 '25 22:04 al-niessner

My intention was to carry out a 'bundle' validation with all of its integrity checking as well. I was trying to circumvent Validate's inability to ignore specific directories by instead targeting only the sub-directories that contain PDS4 labels.

ralanis-jpl avatar Apr 16 '25 23:04 ralanis-jpl

I suspected as much. Unfortunately, or fortunately depending on your outlook on bundles, validate tries to identify files that are not mentioned. I guess many of those that develop PDS bundles like this feature so it is fortunate.

al-niessner avatar Apr 16 '25 23:04 al-niessner

It makes sense if the bundle is a 100% PDS4 bundle. The bundles I am dealing with are "hybrid" PDS3/PDS4 bundles, meaning that not all of the directories are intended to be accounted for by the PDS4 validation software. Some are there solely for PDS3's sake. The nice thing about the old PDS3 validation software was that one could specify which directories to ignore. I had hoped PDS4 Validate could do the same. Thanks.

ralanis-jpl avatar Apr 16 '25 23:04 ralanis-jpl

The reason for hybrid bundles is to preserve the PDS3 format that users still want and have developed software for. At the same time, adding PDS4 labels to the data that is already there reduces duplication of data and meets the PDS3 to PDS4 migration mandate.

ralanis-jpl avatar Apr 16 '25 23:04 ralanis-jpl

@ralanis-jpl Understood on the migration. As @al-niessner mentioned, when running bundle validation, just point to the bundle.xml and nothing else. It will go through all sub-dirs with the current functionality of validate. Unfortunately, we have lots of other work to do, and since this is more of a nuisance than a blocker, we have given this a "could-have" priority and put it into our icebox for implementation at a later date. thanks!

jordanpadams avatar Apr 21 '25 16:04 jordanpadams

I too would like to see this improvement for my archive and migration tasks. I submitted a ticket #1252 (and closed it) before finding this ticket and #1079 after a deeper search giving my input and use cases.

I would like to see an --exclude files/directory option and perhaps an --exclude-list option that points to a list of files/directories to exclude.

tbarnes4 avatar May 21 '25 14:05 tbarnes4