cmor Should there be a PrePARE testing suite?

There are currently tests for the C, Fortran, and Python CMOR API but no tests for PrePARE. This would be helpful in making sure PrePARE is functioning as intended after making changes.

We could have a list of sample NetCDF files, or generate test data with CMOR. Any suggestions?

May 24 '19 17:05 mauzey1

@mauzey1 i think off the CV test are the PrePARE tests. @taylor13 @dnadeau4 am i right?

May 24 '19 17:05 doutriaux1

https://github.com/PCMDI/cmor/blob/master/.circleci/config.yml#L62

May 24 '19 17:05 doutriaux1

@doutriaux1 Those tests seem to be only testing the CMOR Python interface and CMIP6 tables. I was thinking of tests for the PrePARE.py script.

May 24 '19 17:05 mauzey1

@mauzey1 I guess you're right. @taylor13 any idea of useful test we could develop?

May 24 '19 17:05 doutriaux1

Yes, this is a good idea. Not sure I will have time soon to think about this.

Jul 26 '19 00:07 taylor13

@taylor13 @durack1 @sashakames

I would like to get back to this since there are many issues posted related to PrePARE (#532, #533, #534, #540, #541, #553). It would be helpful to have continuous integration tests for PrePARE for the changes that will be added to it. The run_prepare_tests tests currently being used in CircleCI are only testing the CMIP6 CV, not the PrePARE script.

I think we should make a directory of small NetCDF files that have flaws that should be caught by PrePARE as well as some that should pass.

There should be a script that will run PrePARE tests and capture the stdout and stderr output to see if it matches what we expected.

Any suggestions are welcomed.

Jan 03 '20 00:01 mauzey1

@mauzey1 thanks again for raising this, a test suite is a great idea.

To be honest, we have a huge multi-PB archive of CMIP6 files mounted on the css03 hardware so coming up with a very comprehensive test suite wouldn't be an issue (we have every pathology you have ever thought of in the ~1 million files). I suppose CircleCI runs on remote systems right, so we can't mount direct?

Jan 03 '20 00:01 durack1

Running PrePARE on a million files (if that is what is suggested), especially the very large files in the CMIP6 archive, would seem to be inefficient, and perhaps not even practical.

One way to make incremental progress would be to design a test each time we add or modify a PrePARE check to determine whether it actually catches the and correctly describes any problem. You may have a more ambitious (and comprehensive) testing strategy in mind, which I'd be happy to discuss next week.

I'm not sure the test suite should necessarily hold up moving forward on the issues mentioned above.

Jan 03 '20 16:01 taylor13

Agreed that a million files is too many. Though a limit of 100 representative files is reasonable for a test suite that we expect to run repeatedly and also others can easily run to verify that their installation is working properly.

For starters the files are here, but we could provide a script to allow others to download.

As an aside wrt testing, I've run with 10000s of files via the publisher and could easily continue that.

Jan 03 '20 17:01 sashakames

I'm not sure the number of files matters, especially if they are all QC'd files in the CMIP6 archive. How would such files test PrePARE's ability to identify non-compliant files? I think we need to specially construct non-compliant files and then see if PrePARE finds them and provides helpful error messages to users.

Jan 03 '20 18:01 taylor13

Sorry folks I should have been far clearer. I was not suggesting creating a test suite to use a million files, rather I was suggesting that we select a subset of these files that capture known pathologies and then build the test suite on this comprehensive pathology archive. If we encounter new pathologies increment the test suite by one or more. Also for the purposes of the test suite, we could temporally subset the test files to include a single time step, reducing storage footprints and file copy/read times

Jan 03 '20 19:01 durack1

Given that many files already published contain various pathologies (due to several causes), I agree with Paul that we can and should source these from the archive.

The related issues are mainly (1) "soft" false negatives - warnings that should be errors (return a -1 or False) (2) poor error messages that leave the user scratching their head. When these are fixed.
The test suite won't guarantee that we don't encounter additional files that produce (1) or (2). Additionally, I haven't yet seen any false-positives. I think its more likely we hear from a user that data fails when they are certain that it should pass. Catching these isn't the goal of the general regression test suite, but something worth doing long-term.

Jan 03 '20 19:01 sashakames