WEC-Sim_Applications icon indicating copy to clipboard operation
WEC-Sim_Applications copied to clipboard

Test MATLAB caching

Open H0R5E opened this issue 1 year ago • 3 comments

Find the envelope of MATLAB products for all tests and try to cache installation in order to reduce testing time.

Results

For the folder collection step:

OS Without Cache With Cache Reduction Cache Build Increase
Ubuntu 2m 12s 54s 1m 18s 4m 27s 2m 15s

For the Cable application example:

OS Without Cache With Cache Reduction Cache Build Increase
Ubuntu 4m 47s 3m 28s 1m 19s 8m 47s 4m
Windows 7m 55s 5m 11s 2m 44s 14m 44s 6m 49s

Discussion

Pros:

  • Approximately 2m 40s of savings for tests runs on Ubuntu when caches are hit
  • Approximately 4m of savings for tests run on Windows when caches are hit

Cons:

  • Approximately 6m 15s of delay for tests runs on Ubuntu when caches are built
  • Approximately 9m of delay for tests runs on Windows when caches are built
  • If a long-running test is used to build the cache, it will likely time out
  • Caches are deleted if not accessed after 7 days and this gap between builds can often occur for this repo

Conclusion

When the caches are hit, a reduction in testing time of roughly 3-4 minutes is achieved. For some tests, this is a decent saving; however, this is a small percentage of the time required for the longer running tests in this repo. Additionally, the significant increase to the testing time when the caches are being built might trigger unexpected failures if built during a long-running test. Whether this reduction in reliability of the test suite is worth it for the reduction in testing times is uncertain.

H0R5E avatar Sep 18 '24 15:09 H0R5E

@H0R5E thanks for looking into this. Any ideas why the OWC tests aren't running on Windows? https://github.com/WEC-Sim/WEC-Sim_Applications/pull/71

kmruehl avatar Sep 18 '24 18:09 kmruehl

@kmruehl, @akeeste I've run a little experiment on using caching for the MATLAB install. You do get some savings, but my concern would be that if the caches are removed and are regenerated in a long-running test (for a PR for example) then an unexpected failure may occur due to a timeout. This timeout might never be fixable, as the cache won't be stored. Perhaps this needs another experiment to check what will happen in this edge case.

EDIT: One way to deal with this would be to have a maximum time for any test of 30 minutes. There would then be time available to install MATLAB and build the caches before the 45-minute job timeout. We could enforce this by setting a timeout on the test running step, but this would generate a number of failures with the present test suite.

jobs.<job_id>.steps[*].timeout-minutes

H0R5E avatar Sep 19 '24 10:09 H0R5E

Thanks @H0R5E revisiting the tests is something we plan to prioritize for v6.2. Thank you for looking into possible resolutions.

kmruehl avatar Sep 19 '24 15:09 kmruehl

@H0R5E I can't see this board. Also, what's the status of this PR?

@kmruehl, sorry that was not meant to happen. I track my PRs on a personal Trello board for my own project management. I had to reinstall the GitHub plugin, and I didn't realize it would add a comment the PRs by default. I've turned that feature off again now.

Regarding this PR, the code is functional, but it's up to you whether you want to use it, given the pros and cons I have discussed. My advice would probably be not to use it because you can't guarantee that there won't be timeouts when doing cache building with the current test suite. If you can get all the tests below 30 minutes, then it would be safe to use, IMO.

H0R5E avatar Nov 01 '24 12:11 H0R5E

So, this one has been bugging me. It turns out that I set the job timeout to 45 minutes in this commit in order to avoid waiting for hung tests for the default of 6 hours. Sorry, I suffered a little bit of confirmation bias with this one - I had convinced myself this was a GitHub imposed limit.

I think having a timeout to catch hung tests is sensible, but it was clearly in the wrong place. So, I've moved the timeout to the actual test running step. This will allow time for the caches to build and so this PR can be safely merged now.

EDIT: Note that this doesn't change the fact that when the caches are built, test completion will be much slower.

H0R5E avatar Nov 14 '24 09:11 H0R5E

@H0R5E thanks for looking into this. I'm going to go ahead and close this PR without merging because the pros/cons are somewhat even. Perhaps it's something we'll revisit in the future.

kmruehl avatar Mar 01 '25 05:03 kmruehl