gcamdata icon indicating copy to clipboard operation
gcamdata copied to clipboard

Take a more systematic approach to handling currency units

Open pralitp opened this issue 3 years ago • 2 comments

There are a lot of unit conversions going on in gcamdata related to currency, mostly related dollar year deflator adjustments. Sometimes conversions happen before they enter gcamdata and in other cases they are input in their original dollar years (and hopefully documented in the comments what that dollar year is) then converted using the pipeline-helper function gdp_deflator.

In either case the conversion is applied ad hoc and prone to introducing subtle bugs. Having some formal approach would also allow us to change dollar years in GCAM which has long been a distant goal.

Some preliminary thoughts on a design:

  • I think we have been pretty good about only using gdp_deflator to do dollar unit conversions so I would center the solution around that.
  • Enforce some sort of standard / comment based meta data with respect to any input CSV that have price/costs. It would have to be able to target by column. We would want to attach it to the columns with attr (how far/long do we need to maintain this meta data and is that going to be hard?). Anytime gdp_deflator is called it will utilize this meta data instead of the user manually supplying it (should we still allow an explicit param / override?)
  • Keep a gcam.BASE_DOLLAR_YEAR constant in constants.R and make that the default argument for the year to convert to in gdp_deflator.
  • Maybe always automatically call gdp_deflator on any columns that have price unit meta data in get_data to convert to the gcam.BASE_DOLLAR_YEAR (add a flag to not do this for some corner case?) otherwise actually applying the conversion may still be ad hoc -- although we could also check after the fact in a package test instead.

pralitp avatar Nov 02 '22 13:11 pralitp

So if I'm understanding correctly: Whenever an input file contains a column with price units:

  • we would use a custom col_types for read_csv(). For example, we use a p rather than n (numeric), which would tell the code to look for price metadata. We would still read in the column as numeric, but it would throw an error if the price metadata isn't available.
  • we would have another metadata category, something like # PriceUnits: 1990$ or # PriceUnits: 1990$/GJ, where we would parse out the year to plug into gdp_deflator()

Sound right @pralitp?

russellhz avatar Nov 02 '22 16:11 russellhz

So to me the first step of this process is removing the 1990$ used in the C++ code, with its associated hard-wired unit conversions in the transportation and CO2 prices code (there's a constant called CVRT90 that does this; perhaps there are others). At that point there can be a single gcam dollar base year, and from that point, it shouldn't be too hard to automate the generation of the correct price units and the correct values to pass to gdp_deflator() whenever called in the gcamdata code.

pkyle avatar Nov 02 '22 17:11 pkyle