PrePARE checks on cell_methods and cell_measures too strict??
In emails the following suggestion has been made by A. Adcroft:
I think it is useful to check the methods and measures attributes but at L605 the test is for
an exact match
if str(table_value) != str(file_value):
which could be made more forgiving. Something like
if str(table_value) not in str(file_value):
would at least allow additional attributes to be provided by the model. Whether a partial
match like this works for all attributes might is not clear to me and could break some other
test since this conditional is used for other keys.
Let's discuss before acting on this suggestion.
@taylor13 @durack1
Maybe we should comeback to this discussion since it is related to #587. How strict should the check for the cell_methods and cell_measures values be?
Should we expect the same order of words like "cell_measures": "area: areacello volume: volcello", or could they be in different order like "volume: volcello area: areacello"?
cell_methods has similar values such as "cell_methods": "area: mean (comment: over land and sea ice) time: point". Should we be removing the part in the parentheses and check if the file value is "area: mean time: point"? Should "time: point area: mean" also be a valid file value? How would we treat the value "area: time: mean"?
Should we check if the table value is a substring of the file value as suggested by A. Adcroft?
For cell_measures the ordering of the pairs separated by colons is arbitrary.
For cell_methods, the order can matter (as discussed in the 1st paragraph below) and a construct like area: time: mean is legal but does not necessarily have the the same meaning as area: mean time: mean (as discussed in the 2nd paragraph).
If more than one cell method is to be indicated, they should be arranged in the order
they were applied. The left-most operation is assumed to have been applied first.
Suppose, for example, that within each grid cell a quantity varies in both longitude and
time and that these dimensions are named "lon" and "time", respectively. Then values
representing the time-average of the zonal maximum are labeled cell_methods="lon:
maximum time: mean" (i.e. find the largest value at each instant of time over all
longitudes, then average these maxima over time); values of the zonal maximum of
time-averages are labeled cell_methods="time: mean lon: maximum". If the methods
could have been applied in any order without affecting the outcome, they may be put
in any order in the cell_methods attribute.
If a data value is representative of variation over a combination of axes, a single
method should be prefixed by the names of all the dimensions involved (listed in
any order, since in this case the order must be immaterial). Dimensions should be
grouped in this way only if there is an essential difference from treating the dimensions
individually. For instance, the standard deviation of topographic height within a
longitude-latitude gridbox could have cell_methods="lat: lon: standard_deviation".
(Note also, that in accordance with the recommendation of the following paragraph,
this could be equivalently and preferably indicated by cell_methods="area:
standard_deviation".) This is not the same as cell_methods="lon: standard_deviation
lat: standard_deviation", which would mean finding the standard deviation along each
parallel of latitude within the zonal extent of the gridbox, and then the standard
deviation of these values over latitude.
Removing any parenthetical segments first would be the right thing to do. Then it might not be too difficult to perform the checks. Perhaps, turning off these checks might also be considered (and would be easy).
I didn't see the suggestion to check that the cmor_table value is part of the file name, but it would be good thing to do (but not critical).
@taylor13 this is very old, so will close - please comment and reopen if required
I don't think CMOR needs to check this, but PrePARE should check and it would be a good idea to make the check smart enough to only consider the stuff that matters. Could we record this somewhere for a future PrePARE release.
@taylor13 feel free to tag this against the 4.0/Future milestone and reopen - same for the other issues that I closed