cmor icon indicating copy to clipboard operation
cmor copied to clipboard

Improve checking of global attributes in CMOR (enhancement)

Open ehogan opened this issue 7 years ago • 11 comments

Following #88 (which was closed without resolving), using CMOR 3.4.0, I note the following with regards to the required global attributes:

Attribute Form Checked? Replaced? Logged?
activity_id CV No No No
experiment CV No No No
experiment_id CV No No No
forcing_index int Yes N/A N/A
further_info_url CV Yes N/A N/A
grid Free form No No No
grid_label CV No No No
initialisation_index int Yes N/A N/A
institution Registered No Yes Yes
institution_id Registered Yes N/A N/A
license Some required text Yes N/A N/A
nominal_resolution CV Yes N/A N/A
physics_index int Yes N/A N/A
realization_index int Yes N/A N/A
source Registered No Yes No
source_id Registered Yes N/A N/A
source_type CV No No No
sub_experiment CV No No No
sub_experiment_id CV No No No
variant_label CMOR No Yes No

Comparing this with Table 3 (page 18) in the CMIP6 Global Attributes document, I believe the following changes should be made to CMOR for CMOR to be consistent with this document:

  • replace and log activity_id (available via experiment_id)
  • replace and log experiment (available via experiment_id)
  • check experiment_id against CVs
  • check grid_label against CVs
  • log source replacement
  • check source_type against CVs
  • replace and log sub_experiment (available via sub_experiment_id)
  • check sub_experiment_id against CVs
  • log variant_label replacement

Note: without these checks, it is possible for me to create the following file without issues:

output/CMIP6/a/MOHC/UKESM1-0-LL/c/s-r1i1p1f1/Omon/masso/g/v20190121/masso_Omon_UKESM1-0-LL_c_s-r1i1p1f1_g_201001-201005.nc

where a, c, s and g are activity_id, experiment_id, sub_experiment_id and grid_label, respectively.

ehogan avatar Jan 21 '19 13:01 ehogan

Adding "_cmip6_option": "CMIP6", to the cmor_dataset JSON file only improves this slightly:

  • source replacement is logged
  • experiment_id is checked
  • grid_label is checked

ehogan avatar Jan 21 '19 14:01 ehogan

Adding the appropriate values for experiment_id and grid_label improves this further:

  • activity_id is replaced and logged
  • experiment is replaced and logged
  • source_type is checked (but doesn't cause CMOR to exit with an error, resulting in the writing of a bad value to the output netCDF file)
  • sub_experiment_id is replaced and logged (according to Table 3, this value should come from the user)
  • sub_experiment is replaced and logged

However, the replacement value for sub_experiment_id is not used in the path or filename that is written:

output/CMIP6/CMIP/MOHC/UKESM1-0-LL/amip/s-r1i1p1f1/Omon/masso/gn/v20190121/masso_Omon_UKESM1-0-LL_amip_s-r1i1p1f1_gn_201001-201005.nc

ehogan avatar Jan 21 '19 14:01 ehogan

Hi Emma, Thanks for bringing these problems to our attention.

You refer to a "cmor_dataset JSON" file. I should probably recognize this, but don't. So we're all looking at the same input, could you attach that file?

thanks, Karl

taylor13 avatar Jan 21 '19 16:01 taylor13

Sure :)

cmor_dataset_json.txt

I started with an empty file and added attributes (with initial values equal to a letter in the alphabet) when CMOR reported they were required. I then updated the values when CMOR told me too (that's why there are still some values with letters). I'm using the usual masso test example to run CMOR.

ehogan avatar Jan 21 '19 17:01 ehogan

Hi Emma,

We're trying to improve the documentation of what you call the cmor_dataset JSON file. The example file we want folks to start with is: https://github.com/PCMDI/cmor/blob/master/Test/CMOR_input_example.json

The last lines of this input_example file should not be changed by those using CMOR. I noticed that you have removed the following lines from your version of the file. I'm surprised that doesn't break stuff. If you put those lines back in, do things improve at all? Here are the lines that seem to missing from your cmor_dataset JSON file:

    "_AXIS_ENTRY_FILE":         "CMIP6_coordinate.json",
    "_FORMULA_VAR_FILE":        "CMIP6_formula_terms.json",
 
    "mip_era":                "CMIP6",
    "parent_mip_era":         "CMIP6",
 
    "tracking_prefix":        "hdl:21.14100",
    "_history_template":       "%s ;rewrote data to be consistent with <activity_id> for variable <variable_id> found in table <table_id>.",
 
    "#output_path_template":   "Template for output path directory using tables keys or global attributes, these should follow the relevant data reference syntax",
    "output_path_template":    "<mip_era><activity_id><institution_id><source_id><experiment_id><_member_id><table><variable_id><grid_label><version>",
    "output_file_template":    "<variable_id><table><source_id><experiment_id><_member_id><grid_label>",

taylor13 avatar Jan 21 '19 17:01 taylor13

I noticed that we're missing "further_info_url" in https://github.com/PCMDI/cmor/blob/master/Test/CMOR_input_example.json
We should add:

further_info_url:  "",

below the "references" entry in the table.

taylor13 avatar Jan 21 '19 17:01 taylor13

Thanks Karl :) The _AXIS_ENTRY_FILE and _FORMULA_VAR_FILE attributes have default values available in cmor.h. I can add the mip_era with a random value and it is completely ignored (I think the mip_era is obtained by other means). I don't need the parent_mip_era in this case (AMIP doesn't have a parent, so this attribute shouldn't be required). I already had the tracking_prefix at the end of my JSON file. The _history_template, output_path_template and output_file_template attributes also have default values in cmor.h.

I still tried adding these attributes to the end of my file, but there was no difference in the log or output netCDF file.

ehogan avatar Jan 22 '19 13:01 ehogan

With regards to the further_info_url, a minimum value of https://furtherinfo.es-doc.org/ is required, since it is checked against the CVs here. Also, I note that CMOR_DEFAULT_FURTHERURL_TEMPLATE exists; could CMOR use this to automatically construct the further_info_url?

ehogan avatar Jan 22 '19 13:01 ehogan

Thank you for this systematic review of the CMOR checks. I may have an additional check for CMOR:

Within the source_id attribute in CMIP6_CV.json, there is the subattribute activity_participation. What is the procdure if a model produces data for an experiment which is defined by an activity which is not listed as the supported activities for that subattribute?

I can create those files with CMOR, therefore I guess that PrePare will not give Errors either. However, this might cause problems because, on operational scale, those files might be published before someone notices that the activity is not supported. For example, many source_ids differ only by one character. So if someone forget to change LR to HR in the CMOR attribute dataset when switching the models, CMOR will not stop but contrarily change all connected attributes.

wachsylon avatar Feb 19 '19 11:02 wachsylon

@durack1 may have something to say about this, but as I recall, the activity_participation list was meant to be purely informal and not definitive. We asked the groups to include all the activities they were interested in, but they may have left some out (and they may have included some that they had interest in, but in the end could not do). With this understanding, it would be a mistake for CMOR to raise an error if a source participated in an activity that wasn't in the CV list.

I don't know how CMOR could detect a typo in the source_id, as you suggest might happen. It would populate "source" (and other attributes) consistent with the wrong source_id, but doesn't it send a warning message that it is doing this?

taylor13 avatar Feb 19 '19 16:02 taylor13

@taylor13 is correct. At the time of registration in https://github.com/WCRP-CMIP/CMIP6_CVs (which began in 2016) any one model was registered against the list of MIPs that hey had an intention to contribute to. This list of MIPs has expanded over the time (CDRMIP and PAMIP) and so there may be instances where the registration doesn't match the current intent

durack1 avatar Feb 25 '19 19:02 durack1

This is relevant for the ongoing work in the mip-cmor-tables and Project_CVs (e.g. CMIP6Plus_CVs), but not relevant for CMOR3 planned work, so will close

Ping @wolfiex @taylor13 @matthew-mizielinski

durack1 avatar Apr 07 '24 16:04 durack1