cmor icon indicating copy to clipboard operation
cmor copied to clipboard

Resilliency measures to prevent table hacks

Open sashakames opened this issue 6 years ago • 14 comments

Example file: CMIP6/PMIP/NUIST/NESM3/lig127k/r1i1p1f1/Ofx/areacelli/gn/v20190924/areacelli_Ofx_NESM3_lig127k_r1i1p1f1_gn.nc CMOR should not produce this because it doesn't match the tables.

The file will pass PrePARE if I add the same made-up variable with the "sea" realm.

Possibilities: (1) CMOR/PrePARE only uses source repos with "tag" checkouts without modified files (2) gets Table data from a server.

sashakames avatar Jan 16 '20 21:01 sashakames

@sashakames I think the way that @doutriaux1 did this in the past was to compare the hash of the table file used with the version they identified they used. If the hash wasn't identical, then report a "user error"

durack1 avatar Jan 16 '20 22:01 durack1

@sashakames @durack1 is right. In the tables repo I stored the md5 of the tables see: https://github.com/PCMDI/cmip5-cmor-tables/blob/master/Lib/gen_table_md5s.py

doutriaux1 avatar Jan 16 '20 22:01 doutriaux1

and https://github.com/PCMDI/cmip5-cmor-tables/blob/master/Tables/md5s

doutriaux1 avatar Jan 16 '20 22:01 doutriaux1

While we're on this subject, for the CMIP6_CVs we embed some version information in the files, so e.g.:

    "version_metadata":{
        "CV_collection_modified":"Wed Jan 15 16:44:36 2020 -0800",
        "CV_collection_version":"6.2.45.7",
        "author":"Paul J. Durack <[email protected]>",
        "institution_id":"PCMDI",
        "mip_era_CV_modified":"Thu Aug 25 17:21:00 2016 -0700",
        "mip_era_CV_note":"Fix #36 - Add CV name to json structure",
        "previous_commit":"f07f21c4a1fa5b87a39d1fa51106fd7fad1583b8",
        "specs_doc":"v6.2.7 (10th September 2018; https://goo.gl/v1drZl)"

https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/mip_era.json#L9-L17

I note this is a little fiddly, as you can't embed the commit hash in the file that corresponds with that particular version as the change to the hash means you're always one step behind yourself, but this could be implemented external to the table files just like @doutriaux1 did for cmip5.

durack1 avatar Jan 16 '20 22:01 durack1

Reasonable solution and PrePARE should do so too, and we check that CMOR still will enforce this as it must have been disabled, unless NUIST created their own hashes to circumvent the check.

sashakames avatar Jan 16 '20 22:01 sashakames

@sashakames @doutriaux1 it would seem the CMOR already provides this info as netcdf file metadata, see: :table_info = "Creation Date:(09 May 2019) MD5:fa1fbd5904d4b890fb319f821b9ab99a" We'd just need to validate this is an "official" hash/date pair

durack1 avatar Jan 16 '20 22:01 durack1

yep that's what I implemented for CMIP5. I think there's even tools for it (attributes name may have changed in CMOR3), but essentially go fetch md5s file from github directly and validate it matches.

doutriaux1 avatar Jan 16 '20 22:01 doutriaux1

I double checked and the hash is there in the file that should be rejected!

sashakames avatar Jan 16 '20 22:01 sashakames

Sorry for all the edits. Clearly there must be a bug somewhere that the check isn't enforced even though the hash gets written out.

sashakames avatar Jan 16 '20 22:01 sashakames

no we never checked the hash! Sadly. I added the tag in CMOR2 with the hope that people would use to check. I created the tools in the cmip5-cmor-tables repo to check if the hash matches. But nobody ever cared to use this in production...

doutriaux1 avatar Jan 16 '20 22:01 doutriaux1

@sashakames just circling around on some of these issues, there has been conversations about creating conda packages for the cmip6-cmor-tables, if we did that, it would facilitate the automated testing far more easily - it would also allow us to embed md5 hashes in the package tar ball

durack1 avatar Aug 16 '22 22:08 durack1

@durack1 That sounds promising, so the embedded tables in the conda package would be much more difficult to tamper with. You just need to engineer the conda build so its not simple enough to download from source and install yourself.

sashakames avatar Aug 16 '22 22:08 sashakames

@matthew-mizielinski thinking that this issue should be moved across to the https://github.com/PCMDI/mip-cmor-tables/ repo?

durack1 avatar Nov 28 '22 17:11 durack1

A very good idea for CMOR4 planning, along with "shipping" CMOR with the mip-cmor-tables - somewhat of a dupe for #676

durack1 avatar Apr 07 '24 16:04 durack1