RCMIP5 icon indicating copy to clipboard operation
RCMIP5 copied to clipboard

Include global means of key variables in package

Open bpbond opened this issue 10 years ago • 22 comments

We've gotten permission from LLNL's Karl Taylor (email excerpted below) to include summarized data with the next release of RCMIP5. Let's use this issue to come up with a list of variables, and @cahartin you and I can prepare and include them?

You may include the global means as part of the R package. You should be sure to do the following:

  • Update your global means, when necessary, to accurately reflect the CMIP5 archive. This will prevent known flawed data from being distributed.
  • Indicate to users that the original CMIP5 data (from which your global means are calculated) can be accessed through the ESGF data portals (see http://pcmdi-cmip.llnl.gov/cmip5/availability.html).
  • Provide some indication that the modeling groups themselves have not provided the global means but these have be derived by you. You should also provide information about the algorithm used (for example, you would want to tell them that (I presume) you've weighted the grid cell values by the area of the cells).

I think only the first point will require much effort on your part. The archive is relatively stable now, but you might check a couple times each year that no data have been withdrawn/replaced.

bpbond avatar May 22 '15 17:05 bpbond

Candidate variables to include (global and annual, by model and experiment, ensemble means):

  • ph - ocean pH
  • tos - surface ocean temperature
  • tas - surface air temperature
  • nbp - terrestrial net biome production
  • npp - terrestrial net primary production
  • co2 - atmospheric CO2

Others?

bpbond avatar May 22 '15 17:05 bpbond

That sounds like a great idea. One more to add:

fgco2 - air-sea CO2 flux

cahartin avatar May 22 '15 17:05 cahartin

pr - precipitation

I would prefer to put in the land carbon stocks (cVeg, cLitter, cSoil, cCwd) instead of nbp and the primary fluxes (gpp, ra, and rh) since LUC is included in some models but not others in their nbp/npp calculations. Although that will be showing our clear ecology biases in the variable selection :)

ktoddbrown avatar May 22 '15 17:05 ktoddbrown

Hey @cahartin I have added the necessary infrastructure for including this dataset with the package, following the instructions at http://r-pkgs.had.co.nz/data.html.

Next steps:

  • Generate global data summariesª (you've done this already I think)
  • Combine into a single dataset in R
  • In new file R/datasets.R provide detailed documentation--both descriptive (see top of file) and code
  • Use devtools:use_data command to add dataset to the package

ª Fields should include: variable, model, experiment, year, value, value_sd...what else?

bpbond avatar May 27 '15 18:05 bpbond

sounds good @bpbond. What about ensembles? do we want to average them? or get global means for each ensemble?

cahartin avatar May 27 '15 18:05 cahartin

If we have the individual ensembles already processed, sure, why not break them out separately.

bpbond avatar May 27 '15 22:05 bpbond

in the RCMIP5 package is there an option for calculating standard deviation?

cahartin avatar Jun 05 '15 14:06 cahartin

Between ensembles, during the loadCMIP5 process, you mean? No; mean, max, min, and sum are the only ones supported. You'd need to load the ensembles individually, combine the data, and calculate the sd.

bpbond avatar Jun 05 '15 14:06 bpbond

I guess I'm not 100% sure what standard deviation we want to calculate: Fields should include: variable, model, experiment, year, value, value_sd...what else?

cahartin avatar Jun 05 '15 14:06 cahartin

Oh, for me that was just the global s.d. (i.e. between grid cells). So it's just a second call to makeGlobalStat.

bpbond avatar Jun 05 '15 14:06 bpbond

that makes sense. I'll add that in now.

cahartin avatar Jun 05 '15 14:06 cahartin

I am beginning to reprocess some of the variables. Here is my complete list of variables that will go into the v1.2 ocean: ph, tos, spco2, fgco2 land: nbp, npp, cVeg, cLitter, cSoil, cCwd* atmosphere: pr, co2, tas

  • @ktoddbrown I do not have cCwd downloaded. How critical is this to include? And I am not confident on the carbon pool variables that I have all of the data. This would take me some time to download any remaining data. If you happen to have all of the carbon poll data, I could send you over the script I have been using to process the data and you can run it for those variables?

cahartin avatar Jun 05 '15 15:06 cahartin

What do people think of using gpp, ra, rh instead of npp/nbp? I understand that npp/nbp are inconsistently defined between the models with some including luc and some not. cCwd is minor (only CLM models have it) but required for C-balance.

You would also need to download the areacella and sftlf files to get the correct land area.

@cahartin I can process the land variables but I would also need to double check that I have all of them. I've focused on a subset of 11 'representative' models. If you commit the code to the repository here (@bpbond where do you think it should go? I'm tempted to say in the data directory but that feels wrong.) I can sync and run everything.

ktoddbrown avatar Jun 05 '15 16:06 ktoddbrown

@ktoddbrown if its only a few files that's not a problem to download. I have areacella and sftlf already downloaded. That brings up a good point. Do we want all models or just 11 representative models? (i used 10-11 models in my analyses as well)

cahartin avatar Jun 05 '15 16:06 cahartin

Re code location, I'd say for now put it in unused/ so it doesn't mess up the package build. Slightly longer term, probably let's make it into an internal RCMIP5 function, not normally accessible to the user, but there for our convenience and to document how these data were generated.

Re models, we're providing model-specific data, so might as well provide everything we have on hand, not just 11.

bpbond avatar Jun 05 '15 18:06 bpbond

@ktoddbrown and @bpbond to keep things simple and in time for the next release, I decided to just include the major variables (tas, tos, co2, and pr). I have all of the data and the time to process these. We can always add more variables to later releases. Does this work for you both?

cahartin avatar Jun 10 '15 16:06 cahartin

Yes. You (we) have lots of things going on, so let's keep it simple. We can expand in the future.

bpbond avatar Jun 10 '15 16:06 bpbond

pr is a flux. we want global sums of pr, not global averages, correct?

cahartin avatar Jun 15 '15 19:06 cahartin

Ideally. But we could just do all means, that's fine too, and leave the multiplication to the user. Simpler for us.

On Jun 15, 2015, at 3:29 PM, Corinne Hartin <[email protected]mailto:[email protected]> wrote:

pr is a flux. we want global sums of pr, not global averages, correct?

— Reply to this email directly or view it on GitHubhttps://github.com/JGCRI/RCMIP5/issues/129#issuecomment-112181071.

bpbond avatar Jun 15 '15 19:06 bpbond

I would suggest using a global sum where we can to avoid confusions over ocean vs land vs surface area.

ktoddbrown avatar Jun 15 '15 20:06 ktoddbrown

thanks.

cahartin avatar Jun 15 '15 20:06 cahartin

As noted in my last commit message, I'm assuming we're putting this off until 1.3.

bpbond avatar Jul 29 '16 19:07 bpbond