Improve checking of global attributes in CMOR (enhancement)
Following #88 (which was closed without resolving), using CMOR 3.4.0, I note the following with regards to the required global attributes:
| Attribute | Form | Checked? | Replaced? | Logged? |
|---|---|---|---|---|
activity_id |
CV | No | No | No |
experiment |
CV | No | No | No |
experiment_id |
CV | No | No | No |
forcing_index |
int | Yes | N/A | N/A |
further_info_url |
CV | Yes | N/A | N/A |
grid |
Free form | No | No | No |
grid_label |
CV | No | No | No |
initialisation_index |
int | Yes | N/A | N/A |
institution |
Registered | No | Yes | Yes |
institution_id |
Registered | Yes | N/A | N/A |
license |
Some required text | Yes | N/A | N/A |
nominal_resolution |
CV | Yes | N/A | N/A |
physics_index |
int | Yes | N/A | N/A |
realization_index |
int | Yes | N/A | N/A |
source |
Registered | No | Yes | No |
source_id |
Registered | Yes | N/A | N/A |
source_type |
CV | No | No | No |
sub_experiment |
CV | No | No | No |
sub_experiment_id |
CV | No | No | No |
variant_label |
CMOR | No | Yes | No |
Comparing this with Table 3 (page 18) in the CMIP6 Global Attributes document, I believe the following changes should be made to CMOR for CMOR to be consistent with this document:
- replace and log
activity_id(available viaexperiment_id) - replace and log
experiment(available viaexperiment_id) - check
experiment_idagainst CVs - check
grid_labelagainst CVs - log
sourcereplacement - check
source_typeagainst CVs - replace and log
sub_experiment(available viasub_experiment_id) - check
sub_experiment_idagainst CVs - log
variant_labelreplacement
Note: without these checks, it is possible for me to create the following file without issues:
output/CMIP6/a/MOHC/UKESM1-0-LL/c/s-r1i1p1f1/Omon/masso/g/v20190121/masso_Omon_UKESM1-0-LL_c_s-r1i1p1f1_g_201001-201005.nc
where a, c, s and g are activity_id, experiment_id, sub_experiment_id and grid_label, respectively.
Adding "_cmip6_option": "CMIP6", to the cmor_dataset JSON file only improves this slightly:
-
sourcereplacement is logged -
experiment_idis checked -
grid_labelis checked
Adding the appropriate values for experiment_id and grid_label improves this further:
-
activity_idis replaced and logged -
experimentis replaced and logged -
source_typeis checked (but doesn't cause CMOR to exit with an error, resulting in the writing of a bad value to the output netCDF file) -
sub_experiment_idis replaced and logged (according to Table 3, this value should come from the user) -
sub_experimentis replaced and logged
However, the replacement value for sub_experiment_id is not used in the path or filename that is written:
output/CMIP6/CMIP/MOHC/UKESM1-0-LL/amip/s-r1i1p1f1/Omon/masso/gn/v20190121/masso_Omon_UKESM1-0-LL_amip_s-r1i1p1f1_gn_201001-201005.nc
Hi Emma, Thanks for bringing these problems to our attention.
You refer to a "cmor_dataset JSON" file. I should probably recognize this, but don't. So we're all looking at the same input, could you attach that file?
thanks, Karl
Sure :)
I started with an empty file and added attributes (with initial values equal to a letter in the alphabet) when CMOR reported they were required. I then updated the values when CMOR told me too (that's why there are still some values with letters). I'm using the usual masso test example to run CMOR.
Hi Emma,
We're trying to improve the documentation of what you call the cmor_dataset JSON file. The example file we want folks to start with is: https://github.com/PCMDI/cmor/blob/master/Test/CMOR_input_example.json
The last lines of this input_example file should not be changed by those using CMOR. I noticed that you have removed the following lines from your version of the file. I'm surprised that doesn't break stuff. If you put those lines back in, do things improve at all? Here are the lines that seem to missing from your cmor_dataset JSON file:
"_AXIS_ENTRY_FILE": "CMIP6_coordinate.json",
"_FORMULA_VAR_FILE": "CMIP6_formula_terms.json",
"mip_era": "CMIP6",
"parent_mip_era": "CMIP6",
"tracking_prefix": "hdl:21.14100",
"_history_template": "%s ;rewrote data to be consistent with <activity_id> for variable <variable_id> found in table <table_id>.",
"#output_path_template": "Template for output path directory using tables keys or global attributes, these should follow the relevant data reference syntax",
"output_path_template": "<mip_era><activity_id><institution_id><source_id><experiment_id><_member_id><table><variable_id><grid_label><version>",
"output_file_template": "<variable_id><table><source_id><experiment_id><_member_id><grid_label>",
I noticed that we're missing "further_info_url" in https://github.com/PCMDI/cmor/blob/master/Test/CMOR_input_example.json
We should add:
further_info_url: "",
below the "references" entry in the table.
Thanks Karl :) The _AXIS_ENTRY_FILE and _FORMULA_VAR_FILE attributes have default values available in cmor.h. I can add the mip_era with a random value and it is completely ignored (I think the mip_era is obtained by other means). I don't need the parent_mip_era in this case (AMIP doesn't have a parent, so this attribute shouldn't be required). I already had the tracking_prefix at the end of my JSON file. The _history_template, output_path_template and output_file_template attributes also have default values in cmor.h.
I still tried adding these attributes to the end of my file, but there was no difference in the log or output netCDF file.
With regards to the further_info_url, a minimum value of https://furtherinfo.es-doc.org/ is required, since it is checked against the CVs here. Also, I note that CMOR_DEFAULT_FURTHERURL_TEMPLATE exists; could CMOR use this to automatically construct the further_info_url?
Thank you for this systematic review of the CMOR checks. I may have an additional check for CMOR:
Within the source_id attribute in CMIP6_CV.json, there is the subattribute activity_participation. What is the procdure if a model produces data for an experiment which is defined by an activity which is not listed as the supported activities for that subattribute?
I can create those files with CMOR, therefore I guess that PrePare will not give Errors either. However, this might cause problems because, on operational scale, those files might be published before someone notices that the activity is not supported. For example, many source_ids differ only by one character. So if someone forget to change LR to HR in the CMOR attribute dataset when switching the models, CMOR will not stop but contrarily change all connected attributes.
@durack1 may have something to say about this, but as I recall, the activity_participation list was meant to be purely informal and not definitive. We asked the groups to include all the activities they were interested in, but they may have left some out (and they may have included some that they had interest in, but in the end could not do). With this understanding, it would be a mistake for CMOR to raise an error if a source participated in an activity that wasn't in the CV list.
I don't know how CMOR could detect a typo in the source_id, as you suggest might happen. It would populate "source" (and other attributes) consistent with the wrong source_id, but doesn't it send a warning message that it is doing this?
@taylor13 is correct. At the time of registration in https://github.com/WCRP-CMIP/CMIP6_CVs (which began in 2016) any one model was registered against the list of MIPs that hey had an intention to contribute to. This list of MIPs has expanded over the time (CDRMIP and PAMIP) and so there may be instances where the registration doesn't match the current intent
This is relevant for the ongoing work in the mip-cmor-tables and Project_CVs (e.g. CMIP6Plus_CVs), but not relevant for CMOR3 planned work, so will close
Ping @wolfiex @taylor13 @matthew-mizielinski