Metadata auto population
Overview of the Request I wanted to add a feature in my dataverse installation that can auto populate the metadata description field in a dataset based on the contents of an uploaded file having a certain filename. For example: When a user uploads a file called "metadata.json" in a dataset called "Models", I essentially want to auto populate the description field of the "Models" dataset with the contents of "metadata.json".
I tried doing it by using workflows and by adding a seperate web service that triggers the metadata update API. However I'm facing some challenges with security and maintenance. Is there a better way to achieve this? Thanks.
Other than polling to see when a relevant file appears (e.g. by doing a search for that filename), there's currently no mechanism other than workflows for Dataverse to directly trigger an external app. (There are previewer/explore/config tools but those have to be launched by the user. So you could have a button for users to run your extra processing).
FWIW: There has been discussion of creating events for dataset/file creation, etc. that would help here, but I'm not aware of any active work on this.
auto populate the metadata description field in a dataset based on the contents of an uploaded file having a certain filename
@TanayKarve this sounds a lot like the processing we do on FITS files today:
https://guides.dataverse.org/en/5.12/user/dataset-management.html#astronomy-fits
FWIW: There has been discussion of creating events for dataset/file creation, etc. that would help here, but I'm not aware of any active work on this.
No promises but @atrisovic and I are hoping to populate some dataset-level metadata from NetCDF files as part of
Please see the brainstorming doc linked from here: https://github.com/IQSS/dataverse-pm/issues/22
Sorry I didn't reply at https://groups.google.com/g/dataverse-community/c/WNKhKHYvWg0/m/iz87b60HEQAJ yet (busy, then traveling). Thanks for also opening an issue. Would you like to add some sample metadata.json files to https://github.com/IQSS/dataverse-sample-data ? That's where I'm planning to put some NetCDF files for testing.
@pdurbin Sure, thanks! I looked into the NetCDF and having it implemented would really solve my issue! However, I looked into the FITS files but was a bit confused about them. Can computational workflows edit dataset metadata? If yes, where can I find the documentation? Do I need to use the native API to update metadata or is there a different way to do it throught FAIR computational workflows?
@TanayKarve maybe we can try to talk this out someday at https://chat.dataverse.org . Or you're welcome to ask during a community call: https://dataverse.org/community-calls
The workflows @qqmyers and I are talking about are called simply "workflows", not "computational workflows". No, I don't think they'll help, unless you're ok with something like this:
- upload file
- publish dataset v1.0
- workflow triggered, updates description, publishes v1.1
(Yes, in this case you'd use the native API for the workflow updating the description.)
What might help is if we added a pre-publish workflow. Perhaps on save of metadata? Or on uploading a file? Something like that. Right now the only workflow is on publish.
@4tikhonov would probably tell you to use a database trigger. Here, check this out:
-
cur.execute("LISTEN released_versionstate_datasetversion;")at https://github.com/IQSS/dataverse-docker/blob/2ca2c0889cc0030913a706eb085289ed664d690b/triggers/external-services.py#L20 -
PERFORM pg_notify('released_versionstate_datasetversion...at https://github.com/IQSS/dataverse-docker/blob/33bb6b84a2a5f1cfe0596a1ba0660e5d27821ed8/triggers/external-service.sql#L23
I believe Tanay has moved on. Closing.