Refactored `validate-refs` to use new OpenSearch Serverless and registry-common library
🗒️ Summary
Use registry-common for access to opensearch via Java SDK v2
⚙️ Test Data and/or Report
Had to remove the automated RI checks that did not seem to work anyway (they were causing errors but ignored by checkers) because need a mock for registry-common ConnectionFactory and RestClient. Loaded a DB by hand using ref data repository then ran RI. It ran to completion with no unexpected errors.
♻️ Related Issues
Closes #895 Depends on NASA-PDS/registry-common#57 depends on NASA-PDS/registry-common#89 depends on NASA-PDS/harvest#191
@jasonmlkang @tloubrieu-jpl
Stuck at a bug that requires validate pom include jakarta. If we are moving validate-refs to harvest (my recommendation because they share a large amount of code via config) then this would be the time to do it.
@al-niessner let's maybe move forward with this adding this JAR, and then we can create a ticket down the road to refactor, as needed. I like this validate-refs being in the same package as validate because users don't want to download or care to download 2 packages to do "validation". But we can climb that tree when we get to it.
@al-niessner also looks like the tests are failing here.
@jordanpadams
This cannot build until we update registry-common and harvest. Even if it is complete, it cannot work here or in the wild until those repos are updated. Currently, it only works in my deve environment.
@al-niessner I think we have registry-common merged and some data should be in the AWS OpenSearch Serverless. Do we want to get back onto this task?
Sure. Still stuck with the big question of do we move reference checker to harvest since it shares the configuration or make validate artifact dependent on harvest artifact. I still suggest moving reference checker to harvest since it has zero dependencies on validate.
I understand the grouping issue. I suggest solving that with a a pseudo artifact that installs many tools in some grouping rather than actually trying to group stuff by use rather than software dependency.
@al-niessner looks like this doesn't compile.
Compilation failure
[ERROR] /Users/jpadams/proj/pds/pdsen/workspace/validate/src/main/java/gov/nasa/pds/validate/ri/AuthInformation.java:[6,32] package gov.nasa.pds.harvest.cfg does not exist
Harvest pom?
On Thu, Sep 26, 2024, 11:33 Jordan Padams @.***> wrote:
@al-niessner https://github.com/al-niessner looks like this doesn't compile.
Compilation failure [ERROR] /Users/jpadams/proj/pds/pdsen/workspace/validate/src/main/java/gov/nasa/pds/validate/ri/AuthInformation.java:[6,32] package gov.nasa.pds.harvest.cfg does not exist
— Reply to this email directly, view it on GitHub https://github.com/NASA-PDS/validate/pull/906#issuecomment-2377659239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIUBISS2HCKUK6VGJ6WRV3ZYRHN3AVCNFSM6AAAAABIF7N5KSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXGY2TSMRTHE . You are receiving this because you were mentioned.Message ID: @.***>
@jordanpadams
You were not at the latest versions. Updated but now buildbot cannot find the latest versions of registry-common (2.1.0) or harvest (4.1.0). Get those out and you should be alright.
@al-niessner now that I figured out how to get it to compile once we update registry-common and harvest, how do I run this? do I need to give it a harvest config? can I still give it what I gave it before (a txt file with a URL and a path to an auth file)?
The --help output looks the same, but I'm not sure what I am supposed to put for some of this stuff.
usage: validate-refs LIDVID LABEL-FILEPATH MANIFEST-FILEPATH [-a
<auth-file>] [-c <harvest-config>] [-h <arg>] [-r
<registry-connection>] [-t <count>] [-verbose <arg>]
Checks that (1) all product references within a given product and (2) any
aggregrate product references (bundles -> collections -> products) exist
in the Registry OpenSearch DB or Search API.
Expected positional arguments are either a LIDVID, LABEL-FILEPATH, or
MANIFEST-FILEPATH.
- A LIDVID must start with urn:.
- A LABEL-FILEPATH must be a well formed PDS XML file.
- A MANIFEST-FILEPATH is one item per line with an item being a lidvid
or label. Each line must be terminated by a LF.
Multiple arguments may be given in any order, for example:
> validate-refs urn:nasa:pds:foo::1.0 label.xml urn:nasa:pds:bar::2.0
manifest.txt
-a,--auth-opensearch <auth-file> file with the credential
content to have full,
direct read-only access
to the Registry
OpenSearch DB
-c,--harvest-config <harvest-config> file used by harvest to
ingest data that
contains the appropriate
authorization and
registry connection
information
-h,--help <arg> show this text and exit
-r,--registry-connection <registry-connection> URL point to the
registry connection
information usually of
the form
app://connection/direct/
localhost.xml
-t,--threads <count> process the lidvids in
parallel (multiple
threads) with this
argument being the
maximum number of
threads
-verbose,--verbose <arg> set logging level to
INFO
An auth-file is a text file of the Java property format with two
variables, 'user' and 'password' for example:
user=janedoe
password=mypassword
Both -a and -r are required with using them and are mutually exclusive
with -c
@jordanpadams
Should work with -a and -r now. I cleaned up the CLI help too. Yanked dependency on harvest from POM. Good luck.
@jordanpadams
Still not stable. I do not understand this at all; meaning, it is more fun than usual because it is so challenging.
I am doing this locally mvn package test but it does not do the summary at the end so I cannot tell what really failed and what was supposed to fail. How do I get the summary like in the github actions window?
@al-niessner I think it is just mvn clean test (mvn package runs the tests already)
@jordanpadams
It is ready for your testing again. I have no idea what secret it thinks we have revealed, but cannot believe there is anything really there.
@al-niessner I will update this secrets baseline, but here are the instructions for recreating and auditing the secrets baseline.
FYI, the instructions are here: https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Git-and-Github-Guide#detect-secrets
@jordanpadams @nutjob4life
Thanks for the link. I clicked on the one in the actions console and it gave me a 404.
@jordanpadams
Would it help to say, "The value, not any of the attributes, of <registry> in your harvest config file."? If you did not supply that value, then you will bet that wonderful XML parsing error.
@al-niessner copy that. could we use the same config registry-mgr uses instead? From @tloubrieu-jpl instructions here, this looks like some XML config file, which makes sense since it really needs to be run with Cognito in the loop in order to authorize access, no?
@jordanpadams
The value of <registry> is the same as registry-mgr. Reducing to that saved us dependency problems with harvest. I had originally embedded them into registry-common using app:// but @tloubrieu-jpl moved them to files that the user must maintain. Albeit, using external files is supported even in the original implementation; meaning, he exercised an option not took a different path.
Now that you know how it is all tied together, what words would be helpful in the help?
@al-niessner right. I think I am using the right URL for one of our registries, and it is not working. Were you able to successfully test with a dev or production registry?
@jordanpadams
Can you post or email me your command line? I will see if I can figure out what is wrong.
The solution is not complete because it does not check for the products in all the nodes registry.
@tloubrieu-jpl
Do we not have an index that maps to all of the indices? Would we prefer "index_a,index_b,index_c" or "index_a;index_b;index_c" or either?
@al-niessner The problem here is the current AOSS auth layer does not support this. I may be able to test this with an admin user, but a regular node user would not be able to do this without a change to that layer. blocked by https://github.com/NASA-PDS/registry/issues/348
I'm going to merge this because it kind of works, but it requires the entire bundle and all it's references be loaded into the same registry index, which is not how the system is architected.
updated original comment to no longer resolve #621, since it doesn't really work as intended