As a user, I want referential integrity checks on accumulating bundles to use existing bundle
Checked for duplicates
Yes - I've already checked
🧑🔬 User Persona(s)
Node operator working with a piece of an accumulating bundle
💪 Motivation
I recently did a huge a referential integrity check on a huge backfill MSL/MMM bundle, which contained 2.5M data products for PDS4 migration of older deliveries.
I got a number of member or reference not found errors, e.g.
WARNING [warning.integrity.reference_not_found] A LID reference urn:nasa:pds:msl_mmm:document:msl_mmm_dpsis is referencing a logical identifier for a product not found in this bundle
WARNING [warning.integrity.member_not_found] The member 'urn:nasa:pds:msl_mmm:software' could not be found in any product within the given target.
It would be nice if we could provide an (optional) parameter to validate that says "here's the main bundle". It could then look there to resolve referential integrity checks that are not in the actual directory being validated. Note that there's no reason to actually validate products in the "main" bundle, just see if they exist. To that end, it's possible all we need are the inventory files for the main bundle.
📖 Additional Details
No response
Acceptance Criteria
Given When I perform Then I expect
⚙️ Engineering Details
No response
🎉 I&T
No response
@rgdeen how was validate executed where you are seeing this behavior?
We had a partial bundle from the DP with the data but not the documents, because those didn't change in this release.
If you're asking about command-line parameters....
validate --target $1 --skip-content-validation --skip-product-validation -R pds4.bundle
it was validate 3.6.3.
@rgdeen the referential integrity checks should work across the entire bundle, not just in a specific directory. are you saying that urn:nasa:pds:msl_mmm:document:msl_mmm_dpsis does exist in the bundle where this was being executed?
The msl_mmm_dpsis did NOT exist locally on the machine where validate was being run.... because it was a partial bundle, an increment for an accumulating bundle, designed to overlay over the main bundle for release. That part of the bundle did not change, so it should not have been in the incremental overlay being validated. I see several possible options:
- tell it where online to find the main bundle and it can go look there automatically
- use the registry for RI checks (in which case this is probably moot, but that's a huge change)
- option to provide a directory where the "main" bundle exists (would require downloading parts of it)
- option to provide a dir or filename to the inventory files for the "main" bundle, which is all you really need for RI checking (assumption is the inventories are correct).
Remember it's not just the documents, it's any products from prior releases that might be referenced as source products for something in the current release.
Bottom line, validate RI does not work well for accumulating bundles... due to this issue. And there's no practical way to validate a bundle after an accumulating merge either, without downloading the whole thing which is prohibitively expensive. It would be really nice to know that I got the inventories right during the accumulating merge.