pecan icon indicating copy to clipboard operation
pecan copied to clipboard

soilgrids extract function for the CMS SDA

Open serbinsh opened this issue 4 years ago • 6 comments

Description

Frist pass at the soilgrids extract function for the NASA CMS SDA

Review Time Estimate

  • [ ] Immediately
  • [ ] Within one week
  • [ ] When possible

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [x] My change requires a change to the documentation.
  • [ ] My name is in the list of .zenodo.json
  • [x] I have updated the CHANGELOG.md.
  • [x] I have updated the documentation accordingly.
  • [x] I have read the CONTRIBUTING document.
  • [x] I have added tests to cover my changes.
  • [x] All new and existing tests passed.

serbinsh avatar Jan 12 '22 20:01 serbinsh

We are providing here as a place to be reviewed by the CMS team and move forward with a full function. A few things to follow up on

  1. Does this belong in data land or data remote?
  2. What are all of the function arguments we want to have?
  3. We need to pass lat/longs from a bety site query to this function as a site list not just a lat/long (or both?)
  4. We need to set this function up to "loop over" sites after connecting to the virtual raster and not connect to the vrt for each lat/long. The data extract is slow so we want to extract all sites at once not in a loop
  5. @DongchenZ How do you want the output data formatted for integration into the SDA workflow? In the short term maybe we do this and in the long term create an intermediate format that then gets parsed to SDA format (list of lists?)? @mdietze?

I know I am forgetting some things here but wanted to get this up now so Cherry can move forward on completing this if the basic skeleton looks good to folks?

In addition here are some warnings we might want to look into

There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In fn(par, ...) : NAs introduced by coercion
2: In fn(par, ...) : NAs introduced by coercion
3: In fn(par, ...) : NAs introduced by coercion
4: In fn(par, ...) : NAs introduced by coercion
5: In fn(par, ...) : NAs introduced by coercion
6: In fn(par, ...) : NAs introduced by coercion
7: In fn(par, ...) : NAs introduced by coercion
8: In fn(par, ...) : NAs introduced by coercion
9: In fn(par, ...) : NAs introduced by coercion
10: In fn(par, ...) : NAs introduced by coercion
11: In fn(par, ...) : NAs introduced by coercion
12: In fn(par, ...) : NAs introduced by coercion
13: In fn(par, ...) : NAs introduced by coercion
14: In fn(par, ...) : NAs introduced by coercion
15: In fn(par, ...) : NAs introduced by coercion
16: In fn(par, ...) : NAs introduced by coercion
17: In fn(par, ...) : NAs introduced by coercion
18: In fn(par, ...) : NAs introduced by coercion
19: In fn(par, ...) : NAs introduced by coercion
20: In fn(par, ...) : NAs introduced by coercion
21: In fn(par, ...) : NAs introduced by coercion
22: In fn(par, ...) : NAs introduced by coercion
23: In qgamma(qstat, theta[1], theta[2]) : NaNs produced

serbinsh avatar Jan 12 '22 20:01 serbinsh

The main function is in the soilgrids.R file and is currently called "soilgrids.soc.extract"

An example of how the function works at this time:

rm(list = ls())
library(DBI)
library(PEcAn.data.remote)

# get site info
db <- 'betydb'
host_db <- 'modex.bnl.gov'
db_port <- '5432'
db_user <- 'bety'
db_password <- 'bety'

bety <- list(user='bety', password='bety', host='modex.bnl.gov',
             dbname='betydb', driver=RPostgres::Postgres(),write=FALSE)
con <- DBI::dbConnect(drv=bety$driver, dbname=bety$dbname, host=bety$host, password=bety$password, user=bety$user)
con


# test individual sites
suppressWarnings(site_qry <- glue::glue_sql("SELECT *, ST_X(ST_CENTROID(geometry)) AS lon,
                                              ST_Y(ST_CENTROID(geometry)) AS lat FROM sites WHERE id IN ({ids*})",
                                            ids = c("676","622","678","766","764"), .con = con))

suppressWarnings(qry_results.1 <- DBI::dbSendQuery(con,site_qry))
suppressWarnings(qry_results.2 <- DBI::dbFetch(qry_results.1))

qry_results.2
DBI::dbClearResult(qry_results.1)
dbDisconnect(con)

site_list <- qry_results.2
verbose <- TRUE
system.time(result_soc <- soilgrids.soc.extract(site_info=site_list, verbose=verbose))
result_soc

and the output looks like this

> result_soc
$Mean_soc
    site_id      lat        lon    0-5cm   5-15cm  15-30cm  30-60cm 60-100cm 100-200cm
622     622 46.24202  -89.34757 65.68854 37.25930 20.36566 7.826109 5.011070  3.126902
676     676 45.80593  -90.07961 49.61828 31.22689 16.61261 8.230412 2.395168  1.734718
678     678 45.94080  -90.27000 55.58001 32.35441 14.74488 6.919386 2.666095  1.773701
764     764 44.31570 -121.60800 74.78708 22.80451 12.64082 8.590361 3.230583  2.568780
766     766 44.43720 -121.56700 69.53522 31.53413 14.27996 7.598909 4.507278  2.793447

$Sdev_soc
    site_id      lat        lon     0-5cm   5-15cm   15-30cm  30-60cm 60-100cm 100-200cm
622     622 46.24202  -89.34757  58.89990 38.01215 13.210002 5.472139 1.294284 2.2844188
676     676 45.80593  -90.07961  45.24691 30.98490  8.879574 6.299148 1.049219 0.6686239
678     678 45.94080  -90.27000  52.04211 33.99952  9.816848 5.382589 1.407709 0.7605814
764     764 44.31570 -121.60800 130.80391 15.02392  9.033595 9.215060 2.353715 1.4176229
766     766 44.43720 -121.56700  77.75313 27.73879  9.968512 6.702437 3.756643 2.3921458

The present function works by creating internal virtual rasters for each soilgrids depth then extracts the values for each site in the provided site list.

serbinsh avatar Feb 15 '22 15:02 serbinsh

Still needed: the ability to write out an Rdata file containing the results so that we dont have to run the extraction every time....duh

I will update

serbinsh avatar Feb 16 '22 16:02 serbinsh

Updated to output an Rdata file but have not yet set up the function to record the file/results in BETY as I am not yet sure how to manage the lack of a time element but where we do have the site ID. Do we try to capture that so that future CMS runs know where it already has SOC data? How do we record what sites are already in the SOC file? Seems complicated if we are doing runs that only share part of the same site list!

serbinsh avatar Feb 17 '22 15:02 serbinsh

Updating output format too:

> output_data
    site_id      lat       lon    0-5cm   5-15cm  15-30cm  30-60cm 60-100cm 100-200cm stdv_0-5cm stdv_5-15cm stdv_15-30cm stdv_30-60cm stdv_60-100cm stdv_100-200cm
622     622 45.80593 -90.07961 49.61828 31.22689 16.61261 8.230412 2.395168  1.734718   45.24691    30.98490     8.879574     6.299148      1.049219      0.6686239
676     676 46.24202 -89.34757 65.68854 37.25930 20.36566 7.826109 5.011070  3.126902   58.89990    38.01215    13.210002     5.472139      1.294284      2.2844188
678     678 45.94080 -90.27000 55.58001 32.35441 14.74488 6.919386 2.666095  1.773701   52.04211    33.99952     9.816848     5.382589      1.407709      0.7605814

serbinsh avatar Mar 18 '22 14:03 serbinsh

@serbinsh your comments mention a lot of fixes, but they don't seem to have been pushed into the PR yet.

mdietze avatar Mar 24 '22 19:03 mdietze

This is now superseded by @Qianyuxuan new PR #3040

There are tests in here that at some point we may want to bring into data.land but have left those out for now as they are API tests we dont want to hold up checks if the API/URL is unreachable

serbinsh avatar Oct 07 '22 15:10 serbinsh