cmor icon indicating copy to clipboard operation
cmor copied to clipboard

PrePARE checks on cell_methods and cell_measures too strict??

Open taylor13 opened this issue 7 years ago • 3 comments

In emails the following suggestion has been made by A. Adcroft:

I think it is useful to check the methods and measures attributes but at L605 the test is for 
an exact match
    if str(table_value) != str(file_value):
which could be made more forgiving. Something like
    if str(table_value) not in str(file_value):
would at least allow additional attributes to be provided by the model. Whether a partial 
match like this works for all attributes might is not clear to me and could break some other 
test since this conditional is used for other keys.

Let's discuss before acting on this suggestion.

taylor13 avatar Dec 15 '18 17:12 taylor13

@taylor13 @durack1 Maybe we should comeback to this discussion since it is related to #587. How strict should the check for the cell_methods and cell_measures values be?

Should we expect the same order of words like "cell_measures": "area: areacello volume: volcello", or could they be in different order like "volume: volcello area: areacello"?

cell_methods has similar values such as "cell_methods": "area: mean (comment: over land and sea ice) time: point". Should we be removing the part in the parentheses and check if the file value is "area: mean time: point"? Should "time: point area: mean" also be a valid file value? How would we treat the value "area: time: mean"?

Should we check if the table value is a substring of the file value as suggested by A. Adcroft?

mauzey1 avatar Mar 26 '20 18:03 mauzey1

For cell_measures the ordering of the pairs separated by colons is arbitrary.

For cell_methods, the order can matter (as discussed in the 1st paragraph below) and a construct like area: time: mean is legal but does not necessarily have the the same meaning as area: mean time: mean (as discussed in the 2nd paragraph).

If more than one cell method is to be indicated, they should be arranged in the order 
they were applied. The left-most operation is assumed to have been applied first. 
Suppose, for example, that within each grid cell a quantity varies in both longitude and 
time and that these dimensions are named "lon" and "time", respectively. Then values 
representing the time-average of the zonal maximum are labeled cell_methods="lon: 
maximum time: mean" (i.e. find the largest value at each instant of time over all 
longitudes, then average these maxima over time); values of the zonal maximum of 
time-averages are labeled cell_methods="time: mean lon: maximum". If the methods 
could have been applied in any order without affecting the outcome, they may be put 
in any order in the cell_methods attribute.

If a data value is representative of variation over a combination of axes, a single 
method should be prefixed by the names of all the dimensions involved (listed in 
any order, since in this case the order must be immaterial). Dimensions should be 
grouped in this way only if there is an essential difference from treating the dimensions 
individually. For instance, the standard deviation of topographic height within a 
longitude-latitude gridbox could have cell_methods="lat: lon: standard_deviation". 
(Note also, that in accordance with the recommendation of the following paragraph, 
this could be equivalently and preferably indicated by cell_methods="area: 
standard_deviation".) This is not the same as cell_methods="lon: standard_deviation 
lat: standard_deviation", which would mean finding the standard deviation along each 
parallel of latitude within the zonal extent of the gridbox, and then the standard 
deviation of these values over latitude.

Removing any parenthetical segments first would be the right thing to do. Then it might not be too difficult to perform the checks. Perhaps, turning off these checks might also be considered (and would be easy).

taylor13 avatar Mar 26 '20 19:03 taylor13

I didn't see the suggestion to check that the cmor_table value is part of the file name, but it would be good thing to do (but not critical).

taylor13 avatar Mar 26 '20 19:03 taylor13

@taylor13 this is very old, so will close - please comment and reopen if required

durack1 avatar Apr 07 '24 16:04 durack1

I don't think CMOR needs to check this, but PrePARE should check and it would be a good idea to make the check smart enough to only consider the stuff that matters. Could we record this somewhere for a future PrePARE release.

taylor13 avatar Apr 08 '24 19:04 taylor13

@taylor13 feel free to tag this against the 4.0/Future milestone and reopen - same for the other issues that I closed

durack1 avatar Apr 08 '24 20:04 durack1