validate icon indicating copy to clipboard operation
validate copied to clipboard

Refactored `validate-refs` to use new OpenSearch Serverless and registry-common library

Open al-niessner opened this issue 1 year ago • 12 comments

🗒️ Summary

Use registry-common for access to opensearch via Java SDK v2

⚙️ Test Data and/or Report

Had to remove the automated RI checks that did not seem to work anyway (they were causing errors but ignored by checkers) because need a mock for registry-common ConnectionFactory and RestClient. Loaded a DB by hand using ref data repository then ran RI. It ran to completion with no unexpected errors.

♻️ Related Issues

Closes #895 Depends on NASA-PDS/registry-common#57 depends on NASA-PDS/registry-common#89 depends on NASA-PDS/harvest#191

al-niessner avatar May 23 '24 16:05 al-niessner

@jasonmlkang @tloubrieu-jpl

Stuck at a bug that requires validate pom include jakarta. If we are moving validate-refs to harvest (my recommendation because they share a large amount of code via config) then this would be the time to do it.

al-niessner avatar May 23 '24 16:05 al-niessner

@al-niessner let's maybe move forward with this adding this JAR, and then we can create a ticket down the road to refactor, as needed. I like this validate-refs being in the same package as validate because users don't want to download or care to download 2 packages to do "validation". But we can climb that tree when we get to it.

jordanpadams avatar May 23 '24 21:05 jordanpadams

@al-niessner also looks like the tests are failing here.

jordanpadams avatar May 23 '24 21:05 jordanpadams

@jordanpadams

This cannot build until we update registry-common and harvest. Even if it is complete, it cannot work here or in the wild until those repos are updated. Currently, it only works in my deve environment.

al-niessner avatar May 24 '24 15:05 al-niessner

@al-niessner I think we have registry-common merged and some data should be in the AWS OpenSearch Serverless. Do we want to get back onto this task?

jordanpadams avatar Jun 20 '24 21:06 jordanpadams

Sure. Still stuck with the big question of do we move reference checker to harvest since it shares the configuration or make validate artifact dependent on harvest artifact. I still suggest moving reference checker to harvest since it has zero dependencies on validate.

I understand the grouping issue. I suggest solving that with a a pseudo artifact that installs many tools in some grouping rather than actually trying to group stuff by use rather than software dependency.

al-niessner avatar Jun 20 '24 22:06 al-niessner

@al-niessner looks like this doesn't compile.

Compilation failure
[ERROR] /Users/jpadams/proj/pds/pdsen/workspace/validate/src/main/java/gov/nasa/pds/validate/ri/AuthInformation.java:[6,32] package gov.nasa.pds.harvest.cfg does not exist

jordanpadams avatar Sep 26 '24 18:09 jordanpadams

Harvest pom?

On Thu, Sep 26, 2024, 11:33 Jordan Padams @.***> wrote:

@al-niessner https://github.com/al-niessner looks like this doesn't compile.

Compilation failure [ERROR] /Users/jpadams/proj/pds/pdsen/workspace/validate/src/main/java/gov/nasa/pds/validate/ri/AuthInformation.java:[6,32] package gov.nasa.pds.harvest.cfg does not exist

— Reply to this email directly, view it on GitHub https://github.com/NASA-PDS/validate/pull/906#issuecomment-2377659239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIUBISS2HCKUK6VGJ6WRV3ZYRHN3AVCNFSM6AAAAABIF7N5KSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXGY2TSMRTHE . You are receiving this because you were mentioned.Message ID: @.***>

al-niessner avatar Sep 26 '24 18:09 al-niessner

@jordanpadams

You were not at the latest versions. Updated but now buildbot cannot find the latest versions of registry-common (2.1.0) or harvest (4.1.0). Get those out and you should be alright.

al-niessner avatar Sep 26 '24 20:09 al-niessner

@al-niessner now that I figured out how to get it to compile once we update registry-common and harvest, how do I run this? do I need to give it a harvest config? can I still give it what I gave it before (a txt file with a URL and a path to an auth file)?

jordanpadams avatar Sep 26 '24 22:09 jordanpadams

The --help output looks the same, but I'm not sure what I am supposed to put for some of this stuff.

usage: validate-refs LIDVID LABEL-FILEPATH MANIFEST-FILEPATH [-a
       <auth-file>] [-c <harvest-config>] [-h <arg>] [-r
       <registry-connection>] [-t <count>] [-verbose <arg>]

Checks that (1) all product references within a given product and (2) any
aggregrate product references (bundles -> collections -> products) exist
in the Registry OpenSearch DB or Search API.

Expected positional arguments are either a LIDVID, LABEL-FILEPATH, or
MANIFEST-FILEPATH.
   - A LIDVID must start with urn:.
   - A LABEL-FILEPATH must be a well formed PDS XML file.
   - A MANIFEST-FILEPATH is one item per line with an item being a lidvid
or label. Each line must be terminated by a LF.

Multiple arguments may be given in any order, for example:
   > validate-refs urn:nasa:pds:foo::1.0 label.xml urn:nasa:pds:bar::2.0
manifest.txt

 -a,--auth-opensearch <auth-file>                 file with the credential
                                                  content to have full,
                                                  direct read-only access
                                                  to the Registry
                                                  OpenSearch DB
 -c,--harvest-config <harvest-config>             file used by harvest to
                                                  ingest data that
                                                  contains the appropriate
                                                  authorization and
                                                  registry connection
                                                  information
 -h,--help <arg>                                  show this text and exit
 -r,--registry-connection <registry-connection>   URL point to the
                                                  registry connection
                                                  information usually of
                                                  the form
                                                  app://connection/direct/
                                                  localhost.xml
 -t,--threads <count>                             process the lidvids in
                                                  parallel (multiple
                                                  threads) with this
                                                  argument being the
                                                  maximum number of
                                                  threads
 -verbose,--verbose <arg>                         set logging level to
                                                  INFO

An auth-file is a text file of the Java property format with two
variables, 'user' and 'password' for example:
      user=janedoe
      password=mypassword

Both -a and -r are required with using them and are mutually exclusive
with -c

jordanpadams avatar Sep 26 '24 22:09 jordanpadams

@jordanpadams

Should work with -a and -r now. I cleaned up the CLI help too. Yanked dependency on harvest from POM. Good luck.

al-niessner avatar Oct 09 '24 16:10 al-niessner

@jordanpadams

Still not stable. I do not understand this at all; meaning, it is more fun than usual because it is so challenging.

I am doing this locally mvn package test but it does not do the summary at the end so I cannot tell what really failed and what was supposed to fail. How do I get the summary like in the github actions window?

al-niessner avatar Oct 24 '24 21:10 al-niessner

@al-niessner I think it is just mvn clean test (mvn package runs the tests already)

jordanpadams avatar Oct 24 '24 21:10 jordanpadams

@jordanpadams

It is ready for your testing again. I have no idea what secret it thinks we have revealed, but cannot believe there is anything really there.

al-niessner avatar Oct 31 '24 17:10 al-niessner

@al-niessner I will update this secrets baseline, but here are the instructions for recreating and auditing the secrets baseline.

jordanpadams avatar Oct 31 '24 21:10 jordanpadams

FYI, the instructions are here: https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Git-and-Github-Guide#detect-secrets

nutjob4life avatar Oct 31 '24 21:10 nutjob4life

@jordanpadams @nutjob4life

Thanks for the link. I clicked on the one in the actions console and it gave me a 404.

al-niessner avatar Oct 31 '24 22:10 al-niessner

@jordanpadams

Would it help to say, "The value, not any of the attributes, of <registry> in your harvest config file."? If you did not supply that value, then you will bet that wonderful XML parsing error.

al-niessner avatar Nov 01 '24 19:11 al-niessner

@al-niessner copy that. could we use the same config registry-mgr uses instead? From @tloubrieu-jpl instructions here, this looks like some XML config file, which makes sense since it really needs to be run with Cognito in the loop in order to authorize access, no?

jordanpadams avatar Nov 01 '24 19:11 jordanpadams

@jordanpadams

The value of <registry> is the same as registry-mgr. Reducing to that saved us dependency problems with harvest. I had originally embedded them into registry-common using app:// but @tloubrieu-jpl moved them to files that the user must maintain. Albeit, using external files is supported even in the original implementation; meaning, he exercised an option not took a different path.

Now that you know how it is all tied together, what words would be helpful in the help?

al-niessner avatar Nov 01 '24 20:11 al-niessner

@al-niessner right. I think I am using the right URL for one of our registries, and it is not working. Were you able to successfully test with a dev or production registry?

jordanpadams avatar Nov 05 '24 17:11 jordanpadams

@jordanpadams

Can you post or email me your command line? I will see if I can figure out what is wrong.

al-niessner avatar Nov 05 '24 17:11 al-niessner

The solution is not complete because it does not check for the products in all the nodes registry.

tloubrieu-jpl avatar Nov 12 '24 21:11 tloubrieu-jpl

@tloubrieu-jpl

Do we not have an index that maps to all of the indices? Would we prefer "index_a,index_b,index_c" or "index_a;index_b;index_c" or either?

al-niessner avatar Nov 12 '24 22:11 al-niessner

@al-niessner The problem here is the current AOSS auth layer does not support this. I may be able to test this with an admin user, but a regular node user would not be able to do this without a change to that layer. blocked by https://github.com/NASA-PDS/registry/issues/348

jordanpadams avatar Nov 21 '24 20:11 jordanpadams

I'm going to merge this because it kind of works, but it requires the entire bundle and all it's references be loaded into the same registry index, which is not how the system is architected.

updated original comment to no longer resolve #621, since it doesn't really work as intended

jordanpadams avatar Nov 25 '24 21:11 jordanpadams