[dataset]: Seagrass Dipnet Data
Contact details
Dataset Title
EcoMem Seagrass Dipnet Data 2025
Describe your dataset and any specific challenges or blockers you have or anticipate.
I have a dataset from a colleague who is currently conducting an experiment on the effects of boat scars and grazing in seagrass meadows on their associated macrofauna communities. These records were collected using a dipnet at the sample sites, though there is no mention of sampling protocol in the data right now. The taxonomic information has DwC terms (scientificName, scientificNameID, and acceptedNameUsageID), but the remaining columns are not in Darwin Core.
Info about "raw" Data Files.
I have a single sheet with 9 columns: plotID, latitude, longitude, sample date, scientific name, aphiaID, just the numeric aphiaID, count, and dry weight in grams. This is pulled from a larger multiple sheet file my colleague maintains.
I have a working repository where I've posted the code I've written over the last couple of days to programmatically transform this dataset into a Darwin Core Archive with R: https://github.com/brianneduclos/my-first-dwc-archive. Remaining tasks for me to do (that I can think of, anyway; I'm sure there's more!) are:
- write code for the metadata using the EML package
- connect with the data provider to see if I can get more info for the archive (depth, coordinate uncertainty, sampling methods, etc.)
- see if we can add her taxon matching code to create a more complete data processing script
- Figure out if the way I'm generating UUIDs is going to be a problem in future runs of this script with the rest of this data (I suspect it will and may end up waiting to mobilize until the data are fully processed if I can't figure out a way to stabilize them)
- clean up the code (it's real rough right now!), ideally into an R markdown file we can use to help others learn how to build archives from raw occurrences
These updates will be tracked in future versions of the script. Thanks for providing an opportunity to get started on this work!
Great work Brianne!
My 2 cents. Don't worry about writing code to generate EML metadata. Unless you're doing this as some batch process for multiple datasets or really want to automate all of it, it's not all that worth it for one dataset. Instead, work on collecting the appropriate EML metadata and you can add it to the IPT using the browser interface.
My 2 cents. Don't worry about writing code to generate EML metadata. Unless you're doing this as some batch process for multiple datasets or really want to automate all of it, it's not all that worth it for one dataset. Instead, work on collecting the appropriate EML metadata and you can add it to the IPT using the browser interface.
Thanks for your thoughts! I agree with you -- I'm not planning on coding the metadata going forward, but I do kind want to try it just to say that I've done it, you know? The IPT browser interface is much better for providing and formatting EML metadata, for sure.