dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Metadata auto population

Open TanayKarve opened this issue 3 years ago • 1 comments

Overview of the Request I wanted to add a feature in my dataverse installation that can auto populate the metadata description field in a dataset based on the contents of an uploaded file having a certain filename. For example: When a user uploads a file called "metadata.json" in a dataset called "Models", I essentially want to auto populate the description field of the "Models" dataset with the contents of "metadata.json".

I tried doing it by using workflows and by adding a seperate web service that triggers the metadata update API. However I'm facing some challenges with security and maintenance. Is there a better way to achieve this? Thanks.

TanayKarve avatar Oct 18 '22 01:10 TanayKarve

Other than polling to see when a relevant file appears (e.g. by doing a search for that filename), there's currently no mechanism other than workflows for Dataverse to directly trigger an external app. (There are previewer/explore/config tools but those have to be launched by the user. So you could have a button for users to run your extra processing).

FWIW: There has been discussion of creating events for dataset/file creation, etc. that would help here, but I'm not aware of any active work on this.

qqmyers avatar Oct 18 '22 15:10 qqmyers

auto populate the metadata description field in a dataset based on the contents of an uploaded file having a certain filename

@TanayKarve this sounds a lot like the processing we do on FITS files today:

https://guides.dataverse.org/en/5.12/user/dataset-management.html#astronomy-fits

FWIW: There has been discussion of creating events for dataset/file creation, etc. that would help here, but I'm not aware of any active work on this.

No promises but @atrisovic and I are hoping to populate some dataset-level metadata from NetCDF files as part of

Please see the brainstorming doc linked from here: https://github.com/IQSS/dataverse-pm/issues/22

Sorry I didn't reply at https://groups.google.com/g/dataverse-community/c/WNKhKHYvWg0/m/iz87b60HEQAJ yet (busy, then traveling). Thanks for also opening an issue. Would you like to add some sample metadata.json files to https://github.com/IQSS/dataverse-sample-data ? That's where I'm planning to put some NetCDF files for testing.

pdurbin avatar Oct 21 '22 22:10 pdurbin

@pdurbin Sure, thanks! I looked into the NetCDF and having it implemented would really solve my issue! However, I looked into the FITS files but was a bit confused about them. Can computational workflows edit dataset metadata? If yes, where can I find the documentation? Do I need to use the native API to update metadata or is there a different way to do it throught FAIR computational workflows?

TanayKarve avatar Oct 26 '22 18:10 TanayKarve

@TanayKarve maybe we can try to talk this out someday at https://chat.dataverse.org . Or you're welcome to ask during a community call: https://dataverse.org/community-calls

The workflows @qqmyers and I are talking about are called simply "workflows", not "computational workflows". No, I don't think they'll help, unless you're ok with something like this:

  • upload file
  • publish dataset v1.0
  • workflow triggered, updates description, publishes v1.1

(Yes, in this case you'd use the native API for the workflow updating the description.)

What might help is if we added a pre-publish workflow. Perhaps on save of metadata? Or on uploading a file? Something like that. Right now the only workflow is on publish.

@4tikhonov would probably tell you to use a database trigger. Here, check this out:

  • cur.execute("LISTEN released_versionstate_datasetversion;") at https://github.com/IQSS/dataverse-docker/blob/2ca2c0889cc0030913a706eb085289ed664d690b/triggers/external-services.py#L20
  • PERFORM pg_notify('released_versionstate_datasetversion... at https://github.com/IQSS/dataverse-docker/blob/33bb6b84a2a5f1cfe0596a1ba0660e5d27821ed8/triggers/external-service.sql#L23

pdurbin avatar Nov 09 '22 20:11 pdurbin

I believe Tanay has moved on. Closing.

pdurbin avatar Oct 08 '23 13:10 pdurbin