Cache conda environment between CI test runs
Closes #6853
Follows the steps outlined in the setup-miniconda docs to cache the conda environment between test runs.
Note that (as pointed out by @gjoseph92), since we use unpinned CI dependencies in our environments, it's possible for the cached environment to become outdated - hoping to address this with a triggering keyword that can be added to PRs or commits to skip the caching and install the environment from scratch.
- [ ] Tests added / passed
- [ ] Passes
pre-commit run --all-files
Sweet, thanks @charlesbluca!
How can we test this out? Seems like tests don't actually run when you just modify the GitHub actions yamls. Maybe push a spurious change to something else to trigger tests, wait a bit, then push another change and make sure the second build used the cache?
I don't see anything in the modified workflow implying that it only runs on changes to certain files, so think we might just be blocked up by other running jobs in the Dask org; I'll ping this PR with an empty commit in a couple hours to see if things run then, but my plan was to do what you described, and then push a commit containing [skip-caching] to see if we're able to disable caching as expected π
In other PRs, I'm used to seeing jobs at least queued, if not running, as soon as something gets pushed. Like on https://github.com/dask/distributed/pull/6856 right now, I see:

Maybe try pushing a spurious commit right now? Not sure if anything will change if we wait.
Thanks @jrbourbeau π€¦πΌββοΈ didn't realize the failed workflow syntax wouldn't pop up here, will fix now
Looks like that got things working π
Unit Test Results
See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.
βββββββ15 filesβ Β±0βββββββββ15 suitesβ Β±0βββ6h 34m 30s :stopwatch: + 6m 40s ββ2β992 tests Β±0ββββ2β902 :heavy_check_mark: Β±0βββββββ88 :zzz: Β±0ββ2 :x: Β±0β 22β189 runsβ Β±0ββ21β137 :heavy_check_mark: β-β2ββ1β050 :zzz: +2ββ2 :x: Β±0β
For more details on these failures, see this check.
Results for commit 5f85ef7e.βΒ± Comparison against base commit bf537608.
:recycle: This comment has been updated with latest results.
Weird, the cache didn't work between those two commits, even though they seemed to have the same cache key. I had figured the cache would be saved as soon as the cache task completed, but maybe because the first CI job was interrupted, it wasn't?
It looks like maybe the cache isn't being saved for failed runs?
Unfortunately, due to GitHub Actions running on UTC, the cached environments for the non-macOS runs for https://github.com/dask/distributed/pull/6855/commits/a08262380d8064a66b87991c8f2cd37e1e2a0bc3 are no longer valid and need to be solved again. However, the macOS runs should have created a cached environment for today, but I notice the failed 3.8 runs didn't bother to cache the environment after failing (example).
However, the macOS 3.10 run succeeded and did end up saving the conda environment in the cache, which we can see getting picked up here and here. I notice that the time it takes to restore the cache is variable (in one run it takes >30 seconds, in another it takes nearly 4 minutes).
EDIT:
Some quick digging into this shows that the official caching action only updates the cache if the job it was polled in succeeds.
This is somewhat annoying considering the flaky nature of Distributed's testing, but is reasonable considering we might not want the conda environment to get cached if it would end up blocking CI. There is a fork of the standard caching action that always updates the cache, not sure if this is something we want to consider.
Looks like all the OS / python version environments are now cached and we can see the speed up in:
https://github.com/dask/distributed/actions/runs/2822501538
Looks like the option to skip caching is working fine π commits with [skip-caching] in the message should now skip over the caching process and resolve the environment, which should be good in cases where we need to pull in the latest CI deps.
Is there anywhere it would make sense to document this option, or the caching behavior?
cc @gjoseph92 @jrbourbeau this should be ready for review now