validate icon indicating copy to clipboard operation
validate copied to clipboard

As a user, I want referential integrity checks on accumulating bundles to use existing bundle

Open rgdeen opened this issue 1 year ago • 4 comments

Checked for duplicates

Yes - I've already checked

🧑‍🔬 User Persona(s)

Node operator working with a piece of an accumulating bundle

💪 Motivation

I recently did a huge a referential integrity check on a huge backfill MSL/MMM bundle, which contained 2.5M data products for PDS4 migration of older deliveries.

I got a number of member or reference not found errors, e.g.

  WARNING  [warning.integrity.reference_not_found]   A LID reference urn:nasa:pds:msl_mmm:document:msl_mmm_dpsis is referencing a logical identifier for a product not found in this bundle


  WARNING  [warning.integrity.member_not_found]   The member 'urn:nasa:pds:msl_mmm:software' could not be found in any product within the given target.

It would be nice if we could provide an (optional) parameter to validate that says "here's the main bundle". It could then look there to resolve referential integrity checks that are not in the actual directory being validated. Note that there's no reason to actually validate products in the "main" bundle, just see if they exist. To that end, it's possible all we need are the inventory files for the main bundle.

📖 Additional Details

No response

Acceptance Criteria

Given When I perform Then I expect

⚙️ Engineering Details

No response

🎉 I&T

No response

rgdeen avatar Feb 25 '25 00:02 rgdeen

@rgdeen how was validate executed where you are seeing this behavior?

jordanpadams avatar Apr 02 '25 20:04 jordanpadams

We had a partial bundle from the DP with the data but not the documents, because those didn't change in this release.

If you're asking about command-line parameters....

validate --target $1 --skip-content-validation --skip-product-validation -R pds4.bundle

it was validate 3.6.3.

rgdeen avatar Apr 02 '25 21:04 rgdeen

@rgdeen the referential integrity checks should work across the entire bundle, not just in a specific directory. are you saying that urn:nasa:pds:msl_mmm:document:msl_mmm_dpsis does exist in the bundle where this was being executed?

jordanpadams avatar Jun 16 '25 19:06 jordanpadams

The msl_mmm_dpsis did NOT exist locally on the machine where validate was being run.... because it was a partial bundle, an increment for an accumulating bundle, designed to overlay over the main bundle for release. That part of the bundle did not change, so it should not have been in the incremental overlay being validated. I see several possible options:

  1. tell it where online to find the main bundle and it can go look there automatically
  2. use the registry for RI checks (in which case this is probably moot, but that's a huge change)
  3. option to provide a directory where the "main" bundle exists (would require downloading parts of it)
  4. option to provide a dir or filename to the inventory files for the "main" bundle, which is all you really need for RI checking (assumption is the inventories are correct).

Remember it's not just the documents, it's any products from prior releases that might be referenced as source products for something in the current release.

Bottom line, validate RI does not work well for accumulating bundles... due to this issue. And there's no practical way to validate a bundle after an accumulating merge either, without downloading the whole thing which is prohibitively expensive. It would be really nice to know that I got the inventories right during the accumulating merge.

rgdeen avatar Jun 16 '25 22:06 rgdeen