soilgrids extract function for the CMS SDA
Description
Frist pass at the soilgrids extract function for the NASA CMS SDA
Review Time Estimate
- [ ] Immediately
- [ ] Within one week
- [ ] When possible
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
- [x] My change requires a change to the documentation.
- [ ] My name is in the list of .zenodo.json
- [x] I have updated the CHANGELOG.md.
- [x] I have updated the documentation accordingly.
- [x] I have read the CONTRIBUTING document.
- [x] I have added tests to cover my changes.
- [x] All new and existing tests passed.
We are providing here as a place to be reviewed by the CMS team and move forward with a full function. A few things to follow up on
- Does this belong in data land or data remote?
- What are all of the function arguments we want to have?
- We need to pass lat/longs from a bety site query to this function as a site list not just a lat/long (or both?)
- We need to set this function up to "loop over" sites after connecting to the virtual raster and not connect to the vrt for each lat/long. The data extract is slow so we want to extract all sites at once not in a loop
- @DongchenZ How do you want the output data formatted for integration into the SDA workflow? In the short term maybe we do this and in the long term create an intermediate format that then gets parsed to SDA format (list of lists?)? @mdietze?
I know I am forgetting some things here but wanted to get this up now so Cherry can move forward on completing this if the basic skeleton looks good to folks?
In addition here are some warnings we might want to look into
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In fn(par, ...) : NAs introduced by coercion
2: In fn(par, ...) : NAs introduced by coercion
3: In fn(par, ...) : NAs introduced by coercion
4: In fn(par, ...) : NAs introduced by coercion
5: In fn(par, ...) : NAs introduced by coercion
6: In fn(par, ...) : NAs introduced by coercion
7: In fn(par, ...) : NAs introduced by coercion
8: In fn(par, ...) : NAs introduced by coercion
9: In fn(par, ...) : NAs introduced by coercion
10: In fn(par, ...) : NAs introduced by coercion
11: In fn(par, ...) : NAs introduced by coercion
12: In fn(par, ...) : NAs introduced by coercion
13: In fn(par, ...) : NAs introduced by coercion
14: In fn(par, ...) : NAs introduced by coercion
15: In fn(par, ...) : NAs introduced by coercion
16: In fn(par, ...) : NAs introduced by coercion
17: In fn(par, ...) : NAs introduced by coercion
18: In fn(par, ...) : NAs introduced by coercion
19: In fn(par, ...) : NAs introduced by coercion
20: In fn(par, ...) : NAs introduced by coercion
21: In fn(par, ...) : NAs introduced by coercion
22: In fn(par, ...) : NAs introduced by coercion
23: In qgamma(qstat, theta[1], theta[2]) : NaNs produced
The main function is in the soilgrids.R file and is currently called "soilgrids.soc.extract"
An example of how the function works at this time:
rm(list = ls())
library(DBI)
library(PEcAn.data.remote)
# get site info
db <- 'betydb'
host_db <- 'modex.bnl.gov'
db_port <- '5432'
db_user <- 'bety'
db_password <- 'bety'
bety <- list(user='bety', password='bety', host='modex.bnl.gov',
dbname='betydb', driver=RPostgres::Postgres(),write=FALSE)
con <- DBI::dbConnect(drv=bety$driver, dbname=bety$dbname, host=bety$host, password=bety$password, user=bety$user)
con
# test individual sites
suppressWarnings(site_qry <- glue::glue_sql("SELECT *, ST_X(ST_CENTROID(geometry)) AS lon,
ST_Y(ST_CENTROID(geometry)) AS lat FROM sites WHERE id IN ({ids*})",
ids = c("676","622","678","766","764"), .con = con))
suppressWarnings(qry_results.1 <- DBI::dbSendQuery(con,site_qry))
suppressWarnings(qry_results.2 <- DBI::dbFetch(qry_results.1))
qry_results.2
DBI::dbClearResult(qry_results.1)
dbDisconnect(con)
site_list <- qry_results.2
verbose <- TRUE
system.time(result_soc <- soilgrids.soc.extract(site_info=site_list, verbose=verbose))
result_soc
and the output looks like this
> result_soc
$Mean_soc
site_id lat lon 0-5cm 5-15cm 15-30cm 30-60cm 60-100cm 100-200cm
622 622 46.24202 -89.34757 65.68854 37.25930 20.36566 7.826109 5.011070 3.126902
676 676 45.80593 -90.07961 49.61828 31.22689 16.61261 8.230412 2.395168 1.734718
678 678 45.94080 -90.27000 55.58001 32.35441 14.74488 6.919386 2.666095 1.773701
764 764 44.31570 -121.60800 74.78708 22.80451 12.64082 8.590361 3.230583 2.568780
766 766 44.43720 -121.56700 69.53522 31.53413 14.27996 7.598909 4.507278 2.793447
$Sdev_soc
site_id lat lon 0-5cm 5-15cm 15-30cm 30-60cm 60-100cm 100-200cm
622 622 46.24202 -89.34757 58.89990 38.01215 13.210002 5.472139 1.294284 2.2844188
676 676 45.80593 -90.07961 45.24691 30.98490 8.879574 6.299148 1.049219 0.6686239
678 678 45.94080 -90.27000 52.04211 33.99952 9.816848 5.382589 1.407709 0.7605814
764 764 44.31570 -121.60800 130.80391 15.02392 9.033595 9.215060 2.353715 1.4176229
766 766 44.43720 -121.56700 77.75313 27.73879 9.968512 6.702437 3.756643 2.3921458
The present function works by creating internal virtual rasters for each soilgrids depth then extracts the values for each site in the provided site list.
Still needed: the ability to write out an Rdata file containing the results so that we dont have to run the extraction every time....duh
I will update
Updated to output an Rdata file but have not yet set up the function to record the file/results in BETY as I am not yet sure how to manage the lack of a time element but where we do have the site ID. Do we try to capture that so that future CMS runs know where it already has SOC data? How do we record what sites are already in the SOC file? Seems complicated if we are doing runs that only share part of the same site list!
Updating output format too:
> output_data
site_id lat lon 0-5cm 5-15cm 15-30cm 30-60cm 60-100cm 100-200cm stdv_0-5cm stdv_5-15cm stdv_15-30cm stdv_30-60cm stdv_60-100cm stdv_100-200cm
622 622 45.80593 -90.07961 49.61828 31.22689 16.61261 8.230412 2.395168 1.734718 45.24691 30.98490 8.879574 6.299148 1.049219 0.6686239
676 676 46.24202 -89.34757 65.68854 37.25930 20.36566 7.826109 5.011070 3.126902 58.89990 38.01215 13.210002 5.472139 1.294284 2.2844188
678 678 45.94080 -90.27000 55.58001 32.35441 14.74488 6.919386 2.666095 1.773701 52.04211 33.99952 9.816848 5.382589 1.407709 0.7605814
@serbinsh your comments mention a lot of fixes, but they don't seem to have been pushed into the PR yet.
This is now superseded by @Qianyuxuan new PR #3040
There are tests in here that at some point we may want to bring into data.land but have left those out for now as they are API tests we dont want to hold up checks if the API/URL is unreachable