GeoCombine Aardvark Support

Identify Aardvark implications for GeoCombine.

[x] update geocombine rake task to account for aardvark (The geocombine:index rake task has a hard-coded layer_id_s field statement. That field is changed to gbl_wxsIdentifier_s in Aardvark.) (addressed by https://github.com/OpenGeoMetadata/GeoCombine/commit/f05d3441ac9efa8c1d2f68ac3c050b02c0002f79)
[ ] update GeoblacklightHarvester to account for aardvark (layer_slug_s and dc_source_s is present in spec/lib/geo_blacklight_harvester_spec.rb)
[x] #121
[ ] #142
[ ] #156
[x] https://github.com/OpenGeoMetadata/GeoCombine/pull/163

Aug 31 '21 15:08 ewlarson

Do we want two separate rake tasks (geocombine:index-aardvark and geocombine:index)? Will mixed ingesting work in a front end?

Feb 14 '22 20:02 dl-maura

Do we want two separate rake tasks (geocombine:index-aardvark and geocombine:index)? Will mixed ingesting work in a front end?

I would also like to know this. Being able to ingest both Aardvark and 1.0 records would be really useful for our case.

Jan 29 '24 22:01 srappel

Do we want two separate rake tasks (geocombine:index-aardvark and geocombine:index)? Will mixed ingesting work in a front end?

Nope – you just need to set an environment variable. To index Aardvark records, you can set:

SCHEMA_VERSION='Aardvark' bundle exec rake geocombine:index

For 1.0 records, you don't need to do anything, since it is the default – although #163 proposes changing that, since Aardvark should really be the default now. You can force it to use 1.0 with:

SCHEMA_VERSION='1.0' bundle exec rake geocombine:index

Jan 29 '24 23:01 thatbudakguy

Thanks for the reply. Now that the Schema V1 to Aardvark migrator is working, I wonder if it would be possible to ingest 1.0 records and migrate them to Aardvark. For example, our instance uses Aarvark, but I would like to ingest 1.0 records from other portals and have them be migrated to Aardvark automatically. What would it take to add that functionality?

Jan 30 '24 14:01 srappel

I think the way I'd do that is to write a small script or rake task (probably we could even make it part of GeoCombine later) that takes a path to directory as its argument (most likely the cloned OpenGeoMetadata repo for one institution). You could add methods to or adapt the GeoCombine::Harvester class for this, or update the V1AardvarkMigrator to work on whole directories.

The task would need to make two passes:

visit all the v1 records in the repository to build a collection ID map. if the record has a dct_partOf_sm set, add its value to the keys (collection names). if the record has a dc_type_s of Collection, add its layer_slug_s as the value for the key that matches its dc_title_s (collection layer ids).

id_map = {
  'My Collection 1' => 'institution:my-collection-1',
  'My Collection 2' => 'institution:my-collection-2'
}

convert all the v1 records in the repository to Aardvark, using the V1AardvarkMigrator and passing in the collection ID map from step 1. you could save the resulting aardvark JSON file next to the v1 file it was generated from, perhaps with a suffix like -aardvark.json, or put all of it into a new directory (either is fine for the indexer).

If you do it this way, you can just use rake geocombine:index and it will see all of the new, generated Aardvark files in each institution's repository and index those.

Jan 30 '24 17:01 thatbudakguy

Per a conversation at a sprint standup meeting, we should consider updating the GeoCombine test fixture. It seems out of sync with the GeoBlacklight test fixture. Maybe that's okay, but it should probably be part of the process for adding Aardvark support generally.

Jan 30 '24 19:01 srappel

I updated the full_geoblacklight.json and full_geoblacklight_aardvark.json fixtures as part of #143, so that I could test that the migrator turns the former into the latter. If those files don't represent accurate geoblacklight documents, though, we should definitely correct that...let me know if there are any issues you find!

Jan 30 '24 19:01 thatbudakguy