bio_data_guide icon indicating copy to clipboard operation
bio_data_guide copied to clipboard

[dataset]: Seagrass Dipnet Data

Open brianneduclos opened this issue 4 months ago • 4 comments

Contact details

[email protected]

Dataset Title

EcoMem Seagrass Dipnet Data 2025

Describe your dataset and any specific challenges or blockers you have or anticipate.

I have a dataset from a colleague who is currently conducting an experiment on the effects of boat scars and grazing in seagrass meadows on their associated macrofauna communities. These records were collected using a dipnet at the sample sites, though there is no mention of sampling protocol in the data right now. The taxonomic information has DwC terms (scientificName, scientificNameID, and acceptedNameUsageID), but the remaining columns are not in Darwin Core.

Info about "raw" Data Files.

I have a single sheet with 9 columns: plotID, latitude, longitude, sample date, scientific name, aphiaID, just the numeric aphiaID, count, and dry weight in grams. This is pulled from a larger multiple sheet file my colleague maintains.

brianneduclos avatar Sep 29 '25 15:09 brianneduclos

I have a working repository where I've posted the code I've written over the last couple of days to programmatically transform this dataset into a Darwin Core Archive with R: https://github.com/brianneduclos/my-first-dwc-archive. Remaining tasks for me to do (that I can think of, anyway; I'm sure there's more!) are:

  • write code for the metadata using the EML package
  • connect with the data provider to see if I can get more info for the archive (depth, coordinate uncertainty, sampling methods, etc.)
    • see if we can add her taxon matching code to create a more complete data processing script
  • Figure out if the way I'm generating UUIDs is going to be a problem in future runs of this script with the rest of this data (I suspect it will and may end up waiting to mobilize until the data are fully processed if I can't figure out a way to stabilize them)
  • clean up the code (it's real rough right now!), ideally into an R markdown file we can use to help others learn how to build archives from raw occurrences

These updates will be tracked in future versions of the script. Thanks for providing an opportunity to get started on this work!

brianneduclos avatar Nov 06 '25 17:11 brianneduclos

Great work Brianne!

7yl4r avatar Nov 07 '25 15:11 7yl4r

My 2 cents. Don't worry about writing code to generate EML metadata. Unless you're doing this as some batch process for multiple datasets or really want to automate all of it, it's not all that worth it for one dataset. Instead, work on collecting the appropriate EML metadata and you can add it to the IPT using the browser interface.

MathewBiddle avatar Nov 13 '25 13:11 MathewBiddle

My 2 cents. Don't worry about writing code to generate EML metadata. Unless you're doing this as some batch process for multiple datasets or really want to automate all of it, it's not all that worth it for one dataset. Instead, work on collecting the appropriate EML metadata and you can add it to the IPT using the browser interface.

Thanks for your thoughts! I agree with you -- I'm not planning on coding the metadata going forward, but I do kind want to try it just to say that I've done it, you know? The IPT browser interface is much better for providing and formatting EML metadata, for sure.

brianneduclos avatar Nov 14 '25 17:11 brianneduclos