[dataset]: Animal Satellite Telemetry data
Contact details
Dataset Title
ATN satellite telemetry data
Describe your dataset and any specific challenges or blockers you have or anticipate.
We are very close to a final netCDF template for ATN's satellite trajectory deployment files.
https://github.com/ioos/ioos-atn-data/blob/main/templates/atn_trajectory_template.cdl
Last year, I developed an R script to read in the template and start creating a DwC-A package. This year I'd like to finish that work, assuming we finish the template and create some example files.
https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/DRAFT-R-netCDF2DwC.ipynb
xref:
- https://github.com/ioos/ioos_code_lab/pull/13
Link to "raw" Data Files.
https://github.com/ioos/ioos-atn-data/tree/main/data
The netCDF specification will be documented at https://ioos.github.io/ioos-atn-data/
need to decide on a decimation strategy. The frequency of observations varies from 2 minutes to multiple days. Below are some examples of time differences between points in an example dataset:
- 2009-09-25 11:09:00 to 2009-09-25 11:11:00
- 2009-10-08 20:24:00 to 2009-10-15 11:05:00
The decimation strategy that ETN and OTN are working on for acoustic telemetry data is down to a lot of hard work by Peter Desmet and Jonas Mortelmans, and is based in some of Peter's work on camtrap-dp and with other satellite tagged animals. It employs an aggregation strategy of 'take the first detection/location per hour', with other Darwin Core fields like dataGeneralizations helping characterize the summarization by indicating how many detections have been obfuscated by the aggregation.
The benefit of using this method is that each detection is a real point in space and time that the animal was observed, and also it puts a hard upper bound per tag on how many occurrences can be generated by a single individual/tag. There's a lot of background information and ancillary decisions made about how to characterize things like coordinateUncertainty https://github.com/inbo/etn/issues/256 and what the logic for the decimation of the events themselves are here: https://github.com/inbo/etn/blob/main/inst/sql/dwc_occurrence.sql
I've got more code coming that deals with pulling together an Event Core version, with the Occurrences still being generated in a decimated way like this, but with tag attachment and listening station deployments being handled as Events and more things being reported as Extended Measurement or Facts.
I created an example DwC-A package in this PR https://github.com/ioos/ioos_code_lab/pull/13/commits/e58b2b5a340053ee82b0b4da532afc853b1182cf
The template still isn't finalized so I don't want to go too far down the road, but @albenson-usgs gave some great feedback on the initial package, to start addressing:
- [x]
eventIDneeds to be unique for each row in the event file. Right now it's a singleeventIDfor all rows in the event file. - [x]
locationID= Release- I'm not sure what that means. I'm confused why we decided to put that in that field and it doesn't seem like a good fit. Can you explain? - [x] No need to repeat columns in the occurrence file that are already in the event file. So
eventDate,decimalLatitude,decimalLongitude,geodeticDatumcan be dropped.coordinateUncertaintyInMetersbelongs in the event file and hopefully it can be populated. - [x]
occurrenceIDseems strange to me. It is unique for each row but it's basically theeventDatewith "_0_Species" after it. Maybe this is ok but just strikes me weird. - [x] The
organismIDprobably shouldn't have any spaces in it - [x]
occurrenceStatusis missing and is "present" for all rows. - [x] We're still missing other info about the organism that might be beneficial like
sex,lifeStage.
For reference, below is a table of the data available (dumped from the netCDF file), followed by the netCDF header of the metadata available. THESE ARE EXAMPLE DATA and therefore I have redacted some information about the PI.
I think we can address all of the comments above from the available data and metadata.
data table:
| obs | deploy_id | time | z | lat | lon | ptt | instrument | type | location_class | error_radius | semi_major_axis | semi_minor_axis | ellipse_orientation | offset | offset_orientation | gpe_msd | gpe_u | count | qartod_time_flag | qartod_speed_flag | qartod_location_flag | qartod_rollup_flag | crs | trajectory | animal_age | animal_life_stage | animal_sex | animal_weight | animal_length | animal_length_2 | animal | instrument_tag | instrument_location | taxon_name | taxon_lsid | comment |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 09_13-45866 | 2009-09-23 00:00:00 | 0 | 34.03 | -118.56 | 45866 | SPOT | User | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 2 | 1 | 1 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | |
| 1 | 09_13-45866 | 2009-09-25 06:42:00 | 0 | 23.59 | -166.18 | 45866 | SPOT | Argos | A | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 4 | 1 | 4 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | |
| 2 | 09_13-45866 | 2009-09-25 11:09:00 | 0 | 34.024 | -118.556 | 45866 | SPOT | Argos | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 4 | 1 | 4 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | |
| 3 | 09_13-45866 | 2009-09-25 11:11:00 | 0 | 34.035 | -118.549 | 45866 | SPOT | Argos | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 4 | 1 | 4 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 | |
| 4 | 09_13-45866 | 2009-09-27 17:58:00 | 0 | 34.033 | -118.547 | 45866 | SPOT | Argos | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1 | 1 | 1 | 1 | -2147483647 | 5f0668a86321be13bc7ef628 | nan | juvenile | male | nan | 213 | nan | 09_13 | Wildlife Computers SPOT5 | Wildlife Computers SPOT5 | Carcharodon carcharias | urn:lsid:marinespecies.org:taxname:105838 |
netCDF metadata:
xarray.Dataset {
dimensions:
obs = 29 ;
variables:
object deploy_id() ;
deploy_id:long_name = id for this deployment. This is typically the tag ptt ;
deploy_id:comment = Friendly name given to the tag by the user. If no specific friendly name is given, this is the PTT id. ;
deploy_id:instrument = instrument_location ;
deploy_id:platform = animal ;
deploy_id:coverage_content_type = referenceInformation ;
datetime64[ns] time(obs) ;
time:standard_name = time ;
time:axis = T ;
time:_CoordinateAxisType = Time ;
time:long_name = Time of the measurement, in seconds since 1990-01-01 ;
time:actual_min = 2009-09-23T00:00:00Z ;
time:actual_max = 2009-11-23T05:12:00Z ;
time:ancillary_variables = qartod_time_flag qartod_rollup_flag qartod_speed_flag ;
time:instrument = instrument_location ;
time:platform = animal ;
time:coverage_content_type = coordinate ;
float64 z(obs) ;
z:axis = Z ;
z:long_name = depth of measurement ;
z:positive = down ;
z:standard_name = depth ;
z:units = m ;
z:actual_min = 0.0 ;
z:actual_max = 0.0 ;
z:instrument = ;
z:platform = animal ;
z:comment = This variable is synthetically generated to represent the depth of observations ;
z:coverage_content_type = coordinate ;
float64 lat(obs) ;
lat:axis = Y ;
lat:_CoordinateAxisType = Lat ;
lat:long_name = Latitude portion of location in decimal degrees North ;
lat:standard_name = latitude ;
lat:units = degrees_north ;
lat:valid_max = 90.0 ;
lat:valid_min = -90.0 ;
lat:actual_min = 23.59 ;
lat:actual_max = 34.045 ;
lat:ancillary_variables = qartod_location_flag qartod_rollup_flag qartod_speed_flag error_radius semi_major_axis semi_minor_axis ellipse_orientation offset offset_orientation ;
lat:instrument = instrument_location ;
lat:platform = animal ;
lat:coverage_content_type = coordinate ;
float64 lon(obs) ;
lon:axis = X ;
lon:_CoordinateAxisType = Lon ;
lon:long_name = Longitude portion of location in decimal degrees East ;
lon:standard_name = longitude ;
lon:units = degrees_east ;
lon:valid_max = 180.0 ;
lon:valid_min = -180.0 ;
lon:actual_min = -166.18 ;
lon:actual_max = -118.504 ;
lon:ancillary_variables = qartod_location_flag qartod_rollup_flag qartod_speed_flag error_radius semi_major_axis semi_minor_axis ellipse_orientation offset offset_orientation ;
lon:instrument = instrument_location ;
lon:platform = animal ;
lon:coverage_content_type = coordinate ;
float64 ptt(obs) ;
ptt:long_name = Platform Transmitter Terminal (PTT) id used for Argos transmissions ;
ptt:comment = PTT id for this deployment. PTT ids may be used on multiple deployments, but not concurrently. When combined with deployment dates, PTTs can uniquely identify a deployment. ;
ptt:coverage_content_type = referenceInformation ;
ptt:instrument = instrument_location ;
ptt:platform = animal ;
object instrument(obs) ;
instrument:comment = Wildlife Computers instrument family. Variable may report manufacturer default values (e.g., Mk10) and may not match correctly defined instrument_location or instrument_tag variables and attributes. ;
instrument:long_name = Instrument family ;
instrument:instrument = instrument_location ;
instrument:platform = animal ;
instrument:coverage_content_type = referenceInformation ;
object type(obs) ;
type:comment = Type of location: Argos, FastGPS or User ;
type:long_name = Type of location information - Argos, GPS satellite or user provided location ;
type:instrument = instrument_location ;
type:platform = animal ;
type:coverage_content_type = referenceInformation ;
object location_class(obs) ;
location_class:standard_name = quality_flag ;
location_class:comment = Quality codes from the ARGOS satellite (in meters): G,3,2,1,0,A,B,Z. See http://www.argos-system.org/manual/3-location/34_location_classes.htm ;
location_class:long_name = Location Quality Code from ARGOS satellite system ;
location_class:code_values = G,3,2,1,0,A,B,Z ;
location_class:code_meanings = estimated error less than 100m and 1+ messages received per satellite pass, estimated error less than 250m and 4+ messages received per satellite pass, estimated error between 250m and 500m and 4+ messages per satellite pass, estimated error between 500m and 1500m and 4+ messages per satellite pass, estimated error greater than 1500m and 4+ messages received per satellite pass, no least squares estimated error or unbounded kalman filter estimated error and 3 messages received per satellite pass, no least squares estimated error or unbounded kalman filter estimated error and 1 or 2 messages received per satellite pass, invalid location (available for Service Plus or Auxilliary Location Processing) ;
location_class:instrument = instrument_location ;
location_class:platform = animal ;
location_class:ancillary_variables = lat lon ;
location_class:coverage_content_type = qualityInformation ;
float64 error_radius(obs) ;
error_radius:long_name = Error radius ;
error_radius:units = m ;
error_radius:comment = If the position is best represented as a circle, this field gives the radius of that circle in meters. ;
error_radius:instrument = instrument_location ;
error_radius:platform = animal ;
error_radius:ancillary_variables = lat lon offset offset_orientation ;
error_radius:coverage_content_type = qualityInformation ;
float64 semi_major_axis(obs) ;
semi_major_axis:long_name = Error - ellipse semi-major axis ;
semi_major_axis:units = m ;
semi_major_axis:comment = If the estimated position error is best expressed as an ellipse, this field gives the length in meters of the semi-major elliptical axis (one half of the major axis). ;
semi_major_axis:instrument = instrument_location ;
semi_major_axis:platform = animal ;
semi_major_axis:ancillary_variables = lat lon ellipse_orientation offset offset_orientation ;
semi_major_axis:coverage_content_type = qualityInformation ;
float64 semi_minor_axis(obs) ;
semi_minor_axis:long_name = Error - ellipse semi-minor axis ;
semi_minor_axis:units = m ;
semi_minor_axis:comment = If the estimated position error is best expressed as an ellipse, this field gives the length in meters of the semi-minor elliptical axis (one half of the minor axis). ;
semi_minor_axis:instrument = instrument_location ;
semi_minor_axis:platform = animal ;
semi_minor_axis:ancillary_variables = lat lon ellipse_orientation offset offset_orientation ;
semi_minor_axis:coverage_content_type = qualityInformation ;
float64 ellipse_orientation(obs) ;
ellipse_orientation:long_name = Error - ellipse orientation in degrees clockwise from true north ;
ellipse_orientation:units = degrees ;
ellipse_orientation:comment = The angle in degrees of the ellipse from true north, proceeding clockwise (0 to 360). A blank field represents 0 degrees. ;
ellipse_orientation:instrument = instrument_location ;
ellipse_orientation:platform = animal ;
ellipse_orientation:ancillary_variables = lat lon semi_major_axis semi_minor_axis offset offset_orientation ;
ellipse_orientation:coverage_content_type = qualityInformation ;
float64 offset(obs) ;
offset:long_name = Error - offset in meters to center of error ellipse or circle ;
offset:units = m ;
offset:comment = This field is non-zero if the circle or ellipse are not centered on the (Latitude, Longitude) values on this row. "Offset" gives the distance in meters from (Latitude, Longitude) to the center of the ellipse. ;
offset:instrument = instrument_location ;
offset:platform = animal ;
offset:ancillary_variables = lat lon error_radius semi_major_axis semi_minor_axis offset_orientation ;
offset:coverage_content_type = qualityInformation ;
float64 offset_orientation(obs) ;
offset_orientation:long_name = Error - offset orientation angle to ellipse center ;
offset_orientation:units = degrees ;
offset_orientation:comment = If the "Offset" field is non-zero, this field is the angle in degrees from (Latitude, Longitude) to the center of the ellipse. Zero degrees is true north; a blank field represents 0 degrees. ;
offset_orientation:instrument = instrument_location ;
offset_orientation:platform = animal ;
offset_orientation:ancillary_variables = lat lon error_radius semi_major_axis semi_minor_axis offset ;
offset_orientation:coverage_content_type = qualityInformation ;
float64 gpe_msd(obs) ;
gpe_msd:comment = Historical. No longer applicable. ;
gpe_msd:long_name = ;
gpe_msd:units = ;
gpe_msd:instrument = instrument_location ;
gpe_msd:platform = animal ;
gpe_msd:coverage_content_type = auxillaryInformation ;
float64 gpe_u(obs) ;
gpe_u:comment = Historical. No longer applicable. ;
gpe_u:long_name = ;
gpe_u:units = ;
gpe_u:instrument = instrument_location ;
gpe_u:platform = animal ;
gpe_u:coverage_content_type = auxillaryInformation ;
float64 count(obs) ;
count:comment = Total number of times a particular data item was received, verified, and successfully decoded. ;
count:long_name = Count ;
count:units = count ;
count:instrument = instrument_location ;
count:platform = animal ;
count:coverage_content_type = auxillaryInformation ;
float32 qartod_time_flag(obs) ;
qartod_time_flag:standard_name = gross_range_test_quality_flag ;
qartod_time_flag:long_name = Time QC test - gross range test ;
qartod_time_flag:implementation = https://github.com/ioos/ioos_qc/ ;
qartod_time_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ;
qartod_time_flag:flag_values = [1 2 3 4 9] ;
qartod_time_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ;
qartod_time_flag:coverage_content_type = qualityInformation ;
float32 qartod_speed_flag(obs) ;
qartod_speed_flag:standard_name = gross_range_test_quality_flag ;
qartod_speed_flag:long_name = Speed QC test - gross range test ;
qartod_speed_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ;
qartod_speed_flag:implementation = https://github.com/ioos/ioos_qc/ ;
qartod_speed_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ;
qartod_speed_flag:flag_values = [1 2 3 4 9] ;
qartod_speed_flag:coverage_content_type = qualityInformation ;
float32 qartod_location_flag(obs) ;
qartod_location_flag:standard_name = location_test_quality_flag ;
qartod_location_flag:long_name = Location QC test - Location test ;
qartod_location_flag:implementation = https://github.com/ioos/ioos_qc/ ;
qartod_location_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ;
qartod_location_flag:flag_values = [1 2 3 4 9] ;
qartod_location_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ;
qartod_location_flag:coverage_content_type = qualityInformation ;
float32 qartod_rollup_flag(obs) ;
qartod_rollup_flag:standard_name = aggregate_quality_flag ;
qartod_rollup_flag:long_name = Aggregate QC value ;
qartod_rollup_flag:implementation = https://github.com/ioos/ioos_qc/ ;
qartod_rollup_flag:flag_meanings = PASS NOT_EVALUATED SUSPECT FAIL MISSING ;
qartod_rollup_flag:flag_values = [1 2 3 4 9] ;
qartod_rollup_flag:references = https://cdn.ioos.noaa.gov/media/2020/03/QARTOD_TS_Manual_Update2_200324_final.pdf ;
qartod_rollup_flag:coverage_content_type = qualityInformation ;
int32 crs() ;
crs:epsg_code = EPSG:4326 ;
crs:grid_mapping_name = latitude_longitude ;
crs:inverse_flattening = 298.257223563 ;
crs:long_name = Coordinate Reference System - http://www.opengis.net/def/crs/EPSG/0/4326 ;
crs:semi_major_axis = 6378137.0 ;
crs:coverage_content_type = referenceInformation ;
object trajectory() ;
trajectory:cf_role = trajectory_id ;
trajectory:long_name = trajectory identifier ;
float64 animal_age() ;
animal_age:units = ;
animal_age:long_name = age of the animal as measured or estimated at deployment ;
animal_age:coverage_content_type = referenceInformation ;
animal_age:animal_age = Not provided ;
object animal_life_stage() ;
animal_life_stage:animal_life_stage = juvenile ;
animal_life_stage:long_name = Lifestage of the animal at time of deployment ;
animal_life_stage:coverage_content_type = referenceInformation ;
object animal_sex() ;
animal_sex:animal_sex = male ;
animal_sex:long_name = sex of the animal at time of tag deployment ;
animal_sex:coverage_content_type = referenceInformation ;
float32 animal_weight() ;
animal_weight:units = kg ;
animal_weight:long_name = mass of the animal as measured or estimated at deployment ;
animal_weight:animal_weight = Not provided ;
animal_weight:coverage_content_type = referenceInformation ;
float32 animal_length() ;
animal_length:animal_length_type = total length ;
animal_length:units = cm ;
animal_length:animal_length = 213.0 (cm) total length ;
animal_length:long_name = length of the animal as measured or estimated at deployment ;
animal_length:coverage_content_type = referenceInformation ;
float32 animal_length_2() ;
animal_length_2:animal_length_2_type = Not provided ;
animal_length_2:units = ;
animal_length_2:animal_length_2 = Not provided ;
animal_length_2:long_name = length of the animal as measured or estimated at deployment ;
animal_length_2:coverage_content_type = referenceInformation ;
object animal() ;
animal:suborder = ;
animal:infraorder = ;
animal:scientificname = Carcharodon carcharias ;
animal:long_name = tagged animal id ;
animal:superdomain = Biota ;
animal:order = Lamniformes ;
animal:authority = (Linnaeus, 1758) ;
animal:kingdom = Animalia ;
animal:species = Carcharodon carcharias ;
animal:genus = Carcharodon ;
animal:megaclass = ;
animal:family = Lamnidae ;
animal:taxonRankID = 220 ;
animal:class = Elasmobranchii ;
animal:cf_role = trajectory_id ;
animal:coverage_content_type = referenceInformation ;
animal:subphylum = Vertebrata ;
animal:phylum = Chordata ;
animal:AphiaID = 105838 ;
animal:valid_name = Carcharodon carcharias ;
animal:infraphylum = Gnathostomata ;
animal:subclass = Neoselachii ;
animal:rank = Species ;
object instrument_tag() ;
instrument_tag:manufacturer = Wildlife Computers ;
instrument_tag:make_model = SPOT5 ;
instrument_tag:serial_number = 07S0230 ;
instrument_tag:long_name = telemetry tag applied to animal ;
instrument_tag:coverage_content_type = referenceInformation ;
instrument_tag:calibration_date = Not Provided ;
object instrument_location() ;
instrument_location:manufacturer = Wildlife Computers ;
instrument_location:make_model = SPOT5 ;
instrument_location:serial_number = 07S0230 ;
instrument_location:long_name = Wildlife Computers SPOT5 ;
instrument_location:location_type = argos / modeled ;
instrument_location:comment = Location ;
instrument_location:coverage_content_type = referenceInformation ;
instrument_location:calibration_date = Not Provided ;
object taxon_name() ;
taxon_name:standard_name = biological_taxon_name ;
taxon_name:long_name = most precise taxonomic classification for the tagged animal ;
taxon_name:coverage_content_type = referenceInformation ;
taxon_name:source = Froese, R. and D. Pauly. Editors. (2023). FishBase. Carcharodon carcharias (Linnaeus, 1758). Accessed through: World Register of Marine Species at: https://www.marinespecies.org/aphia.php?p=taxdetails&id=105838 on 2023-08-16 ;
taxon_name:url = https://www.marinespecies.org/aphia.php?p=taxdetails&id=105838 ;
<U41 taxon_lsid() ;
taxon_lsid:standard_name = biological_taxon_lsid ;
taxon_lsid:long_name = Namespaced Taxon Identifier for the tagged animal ;
taxon_lsid:coverage_content_type = referenceInformation ;
taxon_lsid:source = Froese, R. and D. Pauly. Editors. (2023). FishBase. Carcharodon carcharias (Linnaeus, 1758). Accessed through: World Register of Marine Species at: https://www.marinespecies.org/aphia.php?p=taxdetails&id=105838 on 2023-08-16 ;
taxon_lsid:url = https://www.marinespecies.org/aphia.php?p=taxdetails&id=105838 ;
object comment(obs) ;
comment:long_name = Comment ;
comment:comment = Optional text field ;
comment:instrument = instrument_location ;
comment:platform = animal ;
comment:coverage_content_type = auxillaryInformation ;
// global attributes:
:date_created = 2023-08-16T20:00:00Z ;
:featureType = trajectory ;
:cdm_data_type = Trajectory ;
:Conventions = CF-1.10, ACDD-1.3, IOOS-1.2 ;
:argos_program_number = ;
:creator_email = ;
:id = 5f0668a86321be13bc7ef628 ;
:tag_type = SPOT5 ;
:source = Service Argos ;
:acknowledgement = NOAA IOOS, Axiom Data Science, Navy ONR, NOAA NMFS, Wildlife Computers, Argos, IOOS ATN ;
:creator_name = ;
:creator_url = ;
:geospatial_lat_units = degrees_north ;
:geospatial_lon_units = degrees_east ;
:infoUrl = ;
:institution = ;
:keywords = EARTH SCIENCE > AGRICULTURE > ANIMAL SCIENCE > ANIMAL ECOLOGY AND BEHAVIOR, EARTH SCIENCE > BIOSPHERE > ECOLOGICAL DYNAMICS > SPECIES/POPULATION INTERACTIONS > MIGRATORY RATES/ROUTES, EARTH SCIENCE > OCEANS, EARTH SCIENCE > CLIMATE INDICATORS > BIOSPHERIC INDICATORS > SPECIES MIGRATION, EARTH SCIENCE > OCEANS, EARTH SCIENCE > BIOLOGICAL CLASSIFICATION > ANIMALS/VERTEBRATES, EARTH SCIENCE > BIOSPHERE > ECOSYSTEMS > MARINE ECOSYSTEMS, PROVIDERS > GOVERNMENT AGENCIES-U.S. FEDERAL AGENCIES > DOC > NOAA > IOOS, PROVIDERS > COMMERCIAL > Axiom Data Science ;
:license = These data may be used and redistributed for free, but are not intended for legal use, since they may contain inaccuracies. No person or group associated with these data makes any warranty, expressed or implied, including warranties of merchantability and fitness for a particular purpose, or assumes any legal liability for the accuracy, completeness or usefulness of this information. This disclaimer applies to both individual use of these data and aggregate use with other data. It is strongly recommended that users read and fully comprehend associated metadata prior to use. Please acknowledge the U.S. Animal Telemetry Network (ATN) or the specified citation as the source from which these data were obtained in any publications and/or representations of these data. Communication and collaboration with dataset authors are strongly encouraged. ;
:metadata_link = ;
:naming_authority = com.wildlifecomputers ;
:platform_category = animal ;
:platform = fish ;
:platform_vocabulary = https://vocab.nerc.ac.uk/collection/L06/current/ ;
:processing_level = NetCDF file created from position data obtained from Wildlife Computers API. ;
:project = Project White Shark: Juvenile Satellite Biotelemetry, 2001-2020 ;
:publisher_email = [email protected] ;
:publisher_institution = US Integrated Ocean Observing System Office ;
:publisher_name = US Integrated Ocean Observing System (IOOS) Animal Telemetry Network (ATN) ;
:publisher_url = https://atn.ioos.us/ ;
:publisher_country = USA ;
:standard_name_vocabulary = CF-v78 ;
:vendor = Wildlife Computers ;
:geospatial_lat_min = 23.59 ;
:geospatial_lat_max = 34.045 ;
:geospatial_lon_min = -166.18 ;
:geospatial_lon_max = -118.504 ;
:geospatial_bbox = POLYGON ((-118.504 23.59, -118.504 34.045, -166.18 34.045, -166.18 23.59, -118.504 23.59)) ;
:geospatial_bounds = POLYGON ((-166.18 23.59, -118.581 34.038, -118.53 34.045, -118.504 33.989, -118.534 33.972, -119.75 33.517, -166.18 23.59)) ;
:geospatial_bounds_crs = EPSG:4326 ;
:time_coverage_start = 2009-09-23T00:00:00Z ;
:time_coverage_end = 2009-11-23T05:12:00Z ;
:time_coverage_duration = P61DT5H12M0S ;
:time_coverage_resolution = P2DT2H39M43S ;
:date_issued = 2023-08-16T20:00:00Z ;
:date_modified = 2023-08-16T20:00:00Z ;
:history = 2023-08-07T20:24:04Z - Created by the IOOS ATN DAC from the Wildlife Computers API ;
:summary = Wildlife Computers SPOT5 tag (ptt id 45866) deployed on a great white shark (Carcharodon carcharias) by Chris G. Lowe in the North Pacific Ocean from 2009-09-23 to 2009-11-23 ;
:title = Great white shark (Carcharodon carcharias) location data from a satellite telemetry tag (ptt id 45866) deployed in the North Pacific Ocean from 2009-09-23 to 2009-11-23, deployment id 5f0668a86321be13bc7ef628 ;
:uuid = ff554ebf-bf4b-5a82-8a90-9c0ceb799d96 ;
:platform_name = Carcharodon carcharias ;
:platform_id = 105838 ;
:vendor_id = 5f0668a86321be13bc7ef628 ;
:sea_name = North Pacific Ocean ;
:arbitrary_keywords = ATN, Animal Telemetry Network, IOOS, Integrated Ocean Observing System, trajectory, satellite telemetry tag ;
:contributor_role_vocabulary = https://vocab.nerc.ac.uk/collection/G04/current/ ;
:creator_role_vocabulary = https://vocab.nerc.ac.uk/collection/G04/current/ ;
:creator_sector_vocabulary = https://mmisw.org/ont/ioos/sector ;
:creator_type = person ;
:date_metadata_modified = 20230816 ;
:instrument = Satellite telemetry tag ;
:instrument_vocabulary = ;
:keywords_vocabulary = GCMD Science Keywords v15.1 ;
:ncei_template_version = NCEI_NetCDF_Trajectory_Template_v2.0 ;
:product_version = ;
:program = IOOS Animal Telemetry Network ;
:publisher_type = institution ;
:references = ;
:animal_common_name = great white shark ;
:animal_id = 09_13 ;
:animal_scientific_name = Carcharodon carcharias ;
:deployment_id = 5f0668a86321be13bc7ef628 ;
:deployment_start_datetime = 2009-09-23T00:00:00Z ;
:deployment_end_datetime = 2009-11-23T00:00:00Z ;
:wmo_platform_code = ;
:comment = 09_13-45866 ;
:ptt_id = 45866 ;
:deployment_start_lat = 34.03 ;
:deployment_start_lon = -118.56 ;
:contributor_name = ;
:contributor_email = ;
:contributor_role = collaborator ;
:contributor_institution = ;
:contributor_url = ;
:creator_role = principalInvestigator ;
:creator_sector = academic ;
:creator_country = USA ;
:creator_institution = ;
:creator_institution_url = ;
:citation = ;
}
</p>
</details>
@albenson-usgs I'm poking around in this now.
For locationID I followed the guidance at https://github.com/tdwg/dwc-for-biologging/wiki/Acoustic-sensor-enabled-tracking-of-blue-sharks
But maybe that's only for the tagging event?
Now that I'm fiddling with the data more, I'm wondering if there should be two/three events.
- Tagging of the animal
- automated tracking of the animal via satellite telemetry
- recovery of animal (if applicable?)
cc @mmckinzie
Maybe https://github.com/tdwg/dwc-for-biologging/wiki/Movebank-GPS-data#darwin-core-recommendation is the right way?
This is what I understand from the text on movebank GPS data:
flowchart LR
A([Deployment])
B([Tag attachment])
C([GPS positions])
A --parentEventID--> B
A --parentEventID--> C
subgraph parent event
A
end
subgraph child events
B
C
end
I worked through some reorganizing after discussion on the Slack space. I think I have addressed most of the comments in https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1692201277
It was decided to go with occurrence and emof (no event).
Here are the files and notebook for review:
- occurrence - https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/atn_45866_occurrence.csv
-
coordinateUncertaintyInMetersis populated with fill values. Apparently this deployment doesn't have information abouterror_radius,semi_major_axis,semi_minor_axis, oroffsetto use for this entry. Is there something we can do when we don't have that information?
-
- emof - https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/atn_45866_emof.csv
- there is missing data from the source file. I will write some additional code to check if data exists and only write out when there are observations. For now, I populated the missing data with fill values but moved the available metadata into the emof record to give a sense of what the table will look like.
- R notebook -https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/DRAFT-R-netCDF2DwC.ipynb
I am most curious about additional information we could be porting into the occurrence or emof record. For example, we have information about the Instrument family (eg. SPOT), Type of location: Argos, FastGPS or User, Location Quality Code from ARGOS satellite system, Platform Transmitter Terminal (PTT) id used for Argos transmissions, instrument_tag (telemetry tag applied to animal including serial number and make_model), and instrument_location (serial_number and make_model). Further information about each of those variables are included in the netCDF metadata in this comment https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1692211792
We also have a few flag variables (time, speed, location, and rollup) and a bunch of metadata that could be stuck somewhere.
ATN data are now being archived at NCEI. For the notebook I'm working on here, I would like to pull the source data from this archival information package. https://www.ncei.noaa.gov/archive/accession/0282699
File - https://www.nodc.noaa.gov/archive/arc0217/0282699/1.1/data/0-data/atn_45866_great-white-shark_trajectory_20090923-20091123.nc
@sformel-usgs will handle the next review on this. Also I know that @jdpye published some (lots?) of data to OBIS somewhat recently and might have some words of wisdom to share.
We did!
I looked over Mat's shoulder briefly at the IOOS DMAC but I would gently recommend we further align this to the standard that OTN and ETN had worked out for all our satellite and acoustic telemetry data publishing, if it's possible. Just a bit of summarization of the occurrences to keep the row count manageable when our datasets get included in general queries against OBIS in the future.
Here is the mapping table for the occurrence record:
| DarwinCore | netCDF |
|---|---|
basisOfRecord |
data contained in the type variable where type of User = HumanObservation and Argos = MachineObservation. |
organismID |
platform_id global attribute plus the animal_common_name global attribute. |
eventDate |
data contained in time variable. Converted to ISO8601. |
occurrenceID |
eventDate, plus data contained in z variable, plus animal_common_name global attribute. |
decimalLatitude |
data in lat variable. |
decimalLongitude |
data in lon variable. |
geodeticDatum |
attribute epsg_code in the crs variable. |
eventID |
animal_common_name global attribute plus the eventDate. |
kingdom |
kingdom attribute in the animal variable. |
taxonRank |
rank attribute in the animal variable. |
occurrenceStatus |
hardcoded to present. |
sex |
data from the variable animal_sex. |
lifeStage |
data from the variable animal_life_stage. |
scientificName |
data from the variable taxon_name. |
scientificNameID |
data from the variable taxon_lsid. |
coordinateUncertaintyInMeters |
maximum value of the data from the variables error_radius, semi_major_axis, and offset. |
And for the measurement or fact file
The measurementOrFact file will only contain information referencing the basisOfRecord = HumanObservation as these observations were made when the animal was directly tagged, in person (ie. when basisOfRecord == HumanObservation).
| DarwinCore Term | Status | netCDF |
|---|---|---|
| organismID | The platform_id global attribute plus the animal_common_name global attribute. |
|
| occurrenceID | Required | eventDate, plus data contained in z variable, plus animal_common_name global attribute. |
| measurementType | Required | long_name attribute of the animal_weight, animal_length, animal_length_2 variables. |
| measurementValue | Required | The data from the animal_weight, animal_length, animal_length_2 variables. |
| eventID | Strongly Recommended | animal_common_name global attribute plus the eventDate. |
| measurementUnit | Strongly Recommended | unit attribute of the animal_weight, animal_length, animal_length_2 variables. |
| measurementMethod | Strongly Recommended | animal_weight, animal_length, animal_length_2 attributes of their respective variables. |
| measurementTypeID | Strongly Recommended | mapping table somewhere? |
| measurementMethodID | Strongly Recommended | mapping table somewhere? |
| measurementUnitID | Strongly Recommended | mapping table somewhere? |
| measurementAccuracy | Share if available | |
| measurementDeterminedDate | Share if available | |
| measurementDeterminedBy | Share if available | |
| measurementRemarks | Share if available | |
| measurementValueID | Share if available |
@MathewBiddle I'm still getting up to speed on this. Does anything need review right now?
@jdpye From https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1686715497, my understanding is the decimation strategy for these satellite telemetry observations should be:
'take the first detection/location per hour', with other Darwin Core fields like dataGeneralizations helping characterize the summarization by indicating how many detections have been obfuscated by the aggregation.
So, I will work on taking my occurrence table and decimating it to the first detection each hour. Does that sound reasonable?
@sformel-usgs Yes! If you don't mind taking a look at the csv files I reference in https://github.com/ioos/bio_data_guide/issues/145#issuecomment-1710385902, that will help us in the overarching organization of these data. I think the decimation strategy will simply limit the amount of rows from what we have above.
@jdpye From #145 (comment), my understanding is the decimation strategy for these satellite telemetry observations should be:
'take the first detection/location per hour', with other Darwin Core fields like dataGeneralizations helping characterize the summarization by indicating how many detections have been obfuscated by the aggregation.
So, I will work on taking my occurrence table and decimating it to the first detection each hour. Does that sound reasonable?
Yep! With this, you can add into dataGeneralizations a string like 'first of # records' to indicate there are more records in the raw dataset to be discovered by the super-curious.
I just finished prototyping up a DwC archive to lonboard / Deck.gl vis tool and so i will attempt to eat your DwC archive with it when i get time!
Here's a stab at filtering the occurrence record down to the first occurrence per hour (in Python). https://gist.github.com/MathewBiddle/d434ac2b538b2728aa80c6a7945f94be
Now to write that in R...
Figured out how to do it in R (hacky but works for now):
library(lubridate)
# sort by date
occurrencedf <- occurrencedf %>% arrange(eventDate)
# create column of date to the hour which will be our decimation strategy
occurrencedf$eventDateHrs <- format(as.POSIXct(occurrencedf$eventDate, format="%Y-%m-%dT%H:%M:%SZ"),"%Y-%m-%dT%H")
# filter table to only unique date + hour and pick the first row keeping all the columns
occurrencedf <- distinct(occurrencedf,eventDateHrs,.keep_all = TRUE)
# nuke the invented column
occurrencedf$eventDateHrs <- NULL
occurrencedf
Filtering by data quality codes
In these data we also have additional information about the Location Quality Code from ARGOS satellite system and QARTOD tests. Below are the codes and those meanings.
ARGOS Codes
| code_values | code meanings |
|---|---|
| G | estimated error less than 100m and 1+ messages received per satellite pass |
| 3 | estimated error less than 250m and 4+ messages received per satellite pass |
| 2 | estimated error between 250m and 500m and 4+ messages per satellite pass |
| 1 | estimated error between 500m and 1500m and 4+ messages per satellite pass |
| 0 | estimated error greater than 1500m and 4+ messages received per satellite pass |
| A | no least squares estimated error or unbounded kalman filter estimated error and 3 messages received per satellite pass |
| B | no least squares estimated error or unbounded kalman filter estimated error and 1 or 2 messages received per satellite pass |
| Z | invalid location (available for Service Plus or Auxilliary Location Processing) |
Since codes A, B, and Z are essentially bad values, I propose that we filter those out.
Also, create a mapping table for coordinateUncertaintyInMeters that corresponds to the ARGOS code maximum error as shown in the table below:
| code | coordinateUncertaintyInMeters |
|---|---|
| G | 100 |
| 3 | 250 |
| 2 | 500 |
| 1 | 1500 |
| 0 | >1500 (not sure what would go there?) |
QARTOD Codes
| value | meaning |
|---|---|
| 1 | PASS |
| 2 | NOT_EVALUATED |
| 3 | SUSPECT |
| 4 | FAIL |
| 9 | MISSING |
The QARTOD tests are:
| variable | long_name |
|---|---|
| qartod_time_flag | Time QC test - gross range test |
| qartod_speed_flag | Speed QC test - gross range test |
| qartod_location_flag | Location QC test - Location test |
| qartod_rollup_flag | Aggregate QC value |
I'm not sure what to do here. My preference would be to include all rows where qartod_rollup_flag == 1 and drop the rest. But I'm open to suggestions.
@sformel-usgs @jdpye I've updated the notebook (and on nbviewer) to include this decimation strategy as well as adding in some initial filtering based on location class and the inclusion of dataGeneralizations to the occurrence record. I've filtered down the emof to only contain data where data were observed.
- occurrence - https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/atn_45866_occurrence.csv
- emof - https://github.com/MathewBiddle/ioos_code_lab/blob/r_nc2dwc/jupyterbook/content/code_gallery/data_management_notebooks/atn_45866_emof.csv
If you don't mind taking a look when you get a chance, it would be much appreciated! I think there are some additional details we can add to the occurrence/emof from the netCDF files, I'm just not sure what.
@MathewBiddle here are some thoughts. I'm still feeling like I don't have a good grasp on all the moving parts, so please ping me here or in Slack if there is anything I didn't address specifically, no matter how small. I don't see any big issues, what you've derived works as a DwC-A. But I'm going to dig through the data a little more and see if there is anything else I think could be included.
-
I was able to work through most of the R notebook with no big issues. There are some spots where I think I could help make things more succinct and/or readable. I just forked the repo and will submit a PR with some suggestions. I'll try to do this tomorrow morning.
-
I couldn't quickly identify where to grab the file,
atn_trajectory_template.ncthat is referenced in the EML building (cell 54). -
coordinateUncertaintyInMetersneeds to be an integer or blank. So, if you can't put a confident maximum boundary on > 1500, then you can leave it blank for unknown. I'll take a closer look at that data when I have some more time. -
I understand that the QARTOD flags are for QC, but I don't know enough about them to say if they should be filtered out. If not, the flags could be included through eMOF (although that might be easy to overlook, and therefore a bad idea).
-
For the eMOF attributes that you've marked as "mapping table somewhere?". I'm not sure if this is what you're after, but I think these would need to be found on a case-by-case basis. But it should be easy to find some examples for
measurementUnitID. The other two would depend on whether or not anyone has published the method and type in database like NERC.
I think I can help find your P01 codes for the measurements, sorry, I didn't look at the emof file on the first pass.
I'll look at this today!
for the coordinateUncertaintyInMeters distance for Argos location class 0, this paper suggests an upper bound of ~ 10km. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0063051
From that paper, this quote:
In brief, “good” positions (location codes 3, 2, 1, A) are accurate to about 2 km, while 0 and B locations are accurate to about 5–10 km. However, due to the lognormal distribution of the errors, larger outliers are to be expected in all location codes and need to be accounted for in the user’s data processing.
does not fill my heart with joy, so the upper bound of the estimate is probably a safer value to include.
Thanks for taking a look! I should have mentioned the EML section of the notebook is a work in progress. It should reference the same netcdf file that is used to generate the dwc files (the one from NCEI). I just haven't updated it in a few months.
Something to discuss is if generating the EML is even necessary. Would OBIS-USA generate the EML? Is there a way to for a provider to upload an EML xml file? How should we deal with this with the expectation that we might want to automate the process?
If everyone has filled in their metadata for the NetCDF files in the same way, we should get a simple EML template for this flavour of data and map our incoming data to it, and submit that to your OBIS publication endpoint along with the data, as an initial pass of the metadata for the archive. Amendments can be made after the initial metadata harvest from the source NetCDF, but we should have a good start from there.
If we build a simple eml.xml and zip it up, the metadata pre-populates and will save your OBIS data manager a bit of headache :D
@MathewBiddle the IPT is all fat fingers. So, the more EML you can generate programmatically, the less time it will take and the less chance of human error. But just do the easy stuff, don't worry about getting every detail.
Since these are satellite telemetry observations, our depth of measurement is always == 0, so minimumDepthInMeters and maximumDepthInMeters should be 0, correct? Does it cause an issue if they are the same value?
No that's fine that they are the same value.