Aardvark Support
Identify Aardvark implications for GeoCombine.
- [x] update geocombine rake task to account for aardvark (The
geocombine:indexrake task has a hard-codedlayer_id_sfield statement. That field is changed togbl_wxsIdentifier_sin Aardvark.) (addressed by https://github.com/OpenGeoMetadata/GeoCombine/commit/f05d3441ac9efa8c1d2f68ac3c050b02c0002f79) - [ ] update GeoblacklightHarvester to account for aardvark (
layer_slug_sanddc_source_sis present inspec/lib/geo_blacklight_harvester_spec.rb) - [x] #121
- [ ] #142
- [ ] #156
- [x] https://github.com/OpenGeoMetadata/GeoCombine/pull/163
Do we want two separate rake tasks (geocombine:index-aardvark and geocombine:index)? Will mixed ingesting work in a front end?
Do we want two separate rake tasks (geocombine:index-aardvark and geocombine:index)? Will mixed ingesting work in a front end?
I would also like to know this. Being able to ingest both Aardvark and 1.0 records would be really useful for our case.
Do we want two separate rake tasks (geocombine:index-aardvark and geocombine:index)? Will mixed ingesting work in a front end?
Nope – you just need to set an environment variable. To index Aardvark records, you can set:
SCHEMA_VERSION='Aardvark' bundle exec rake geocombine:index
For 1.0 records, you don't need to do anything, since it is the default – although #163 proposes changing that, since Aardvark should really be the default now. You can force it to use 1.0 with:
SCHEMA_VERSION='1.0' bundle exec rake geocombine:index
Thanks for the reply. Now that the Schema V1 to Aardvark migrator is working, I wonder if it would be possible to ingest 1.0 records and migrate them to Aardvark. For example, our instance uses Aarvark, but I would like to ingest 1.0 records from other portals and have them be migrated to Aardvark automatically. What would it take to add that functionality?
I think the way I'd do that is to write a small script or rake task (probably we could even make it part of GeoCombine later) that takes a path to directory as its argument (most likely the cloned OpenGeoMetadata repo for one institution). You could add methods to or adapt the GeoCombine::Harvester class for this, or update the V1AardvarkMigrator to work on whole directories.
The task would need to make two passes:
- visit all the v1 records in the repository to build a collection ID map. if the record has a
dct_partOf_smset, add its value to the keys (collection names). if the record has adc_type_sofCollection, add itslayer_slug_sas the value for the key that matches itsdc_title_s(collection layer ids).
id_map = {
'My Collection 1' => 'institution:my-collection-1',
'My Collection 2' => 'institution:my-collection-2'
}
- convert all the v1 records in the repository to Aardvark, using the
V1AardvarkMigratorand passing in the collection ID map from step 1. you could save the resulting aardvark JSON file next to the v1 file it was generated from, perhaps with a suffix like-aardvark.json, or put all of it into a new directory (either is fine for the indexer).
If you do it this way, you can just use rake geocombine:index and it will see all of the new, generated Aardvark files in each institution's repository and index those.
Per a conversation at a sprint standup meeting, we should consider updating the GeoCombine test fixture. It seems out of sync with the GeoBlacklight test fixture. Maybe that's okay, but it should probably be part of the process for adding Aardvark support generally.
I updated the full_geoblacklight.json and full_geoblacklight_aardvark.json fixtures as part of #143, so that I could test that the migrator turns the former into the latter. If those files don't represent accurate geoblacklight documents, though, we should definitely correct that...let me know if there are any issues you find!