dataverse Metadata auto population

Overview of the Request I wanted to add a feature in my dataverse installation that can auto populate the metadata description field in a dataset based on the contents of an uploaded file having a certain filename. For example: When a user uploads a file called "metadata.json" in a dataset called "Models", I essentially want to auto populate the description field of the "Models" dataset with the contents of "metadata.json".

I tried doing it by using workflows and by adding a seperate web service that triggers the metadata update API. However I'm facing some challenges with security and maintenance. Is there a better way to achieve this? Thanks.

Oct 18 '22 01:10 TanayKarve

Other than polling to see when a relevant file appears (e.g. by doing a search for that filename), there's currently no mechanism other than workflows for Dataverse to directly trigger an external app. (There are previewer/explore/config tools but those have to be launched by the user. So you could have a button for users to run your extra processing).

FWIW: There has been discussion of creating events for dataset/file creation, etc. that would help here, but I'm not aware of any active work on this.

Oct 18 '22 15:10 qqmyers

auto populate the metadata description field in a dataset based on the contents of an uploaded file having a certain filename

@TanayKarve this sounds a lot like the processing we do on FITS files today:

https://guides.dataverse.org/en/5.12/user/dataset-management.html#astronomy-fits

FWIW: There has been discussion of creating events for dataset/file creation, etc. that would help here, but I'm not aware of any active work on this.

No promises but @atrisovic and I are hoping to populate some dataset-level metadata from NetCDF files as part of

Please see the brainstorming doc linked from here: https://github.com/IQSS/dataverse-pm/issues/22

Sorry I didn't reply at https://groups.google.com/g/dataverse-community/c/WNKhKHYvWg0/m/iz87b60HEQAJ yet (busy, then traveling). Thanks for also opening an issue. Would you like to add some sample metadata.json files to https://github.com/IQSS/dataverse-sample-data ? That's where I'm planning to put some NetCDF files for testing.

Oct 21 '22 22:10 pdurbin

@pdurbin Sure, thanks! I looked into the NetCDF and having it implemented would really solve my issue! However, I looked into the FITS files but was a bit confused about them. Can computational workflows edit dataset metadata? If yes, where can I find the documentation? Do I need to use the native API to update metadata or is there a different way to do it throught FAIR computational workflows?

Oct 26 '22 18:10 TanayKarve

@TanayKarve maybe we can try to talk this out someday at https://chat.dataverse.org . Or you're welcome to ask during a community call: https://dataverse.org/community-calls

The workflows @qqmyers and I are talking about are called simply "workflows", not "computational workflows". No, I don't think they'll help, unless you're ok with something like this:

upload file
publish dataset v1.0
workflow triggered, updates description, publishes v1.1

(Yes, in this case you'd use the native API for the workflow updating the description.)

What might help is if we added a pre-publish workflow. Perhaps on save of metadata? Or on uploading a file? Something like that. Right now the only workflow is on publish.

@4tikhonov would probably tell you to use a database trigger. Here, check this out:

cur.execute("LISTEN released_versionstate_datasetversion;") at https://github.com/IQSS/dataverse-docker/blob/2ca2c0889cc0030913a706eb085289ed664d690b/triggers/external-services.py#L20
PERFORM pg_notify('released_versionstate_datasetversion... at https://github.com/IQSS/dataverse-docker/blob/33bb6b84a2a5f1cfe0596a1ba0660e5d27821ed8/triggers/external-service.sql#L23

Nov 09 '22 20:11 pdurbin

I believe Tanay has moved on. Closing.

Oct 08 '23 13:10 pdurbin