pre-commit.ci time outs (just over 3m)
Recently many pull requests and some of the checks on the main branch are failing at pre-commit.ci due to time outs, we appear to now slip over the 3m limit.
I think the first master branch failure was https://github.com/biopython/biopython/commit/e33e229f07bcc4ffdcbb3b1ff21cbc93acd88137 - see https://results.pre-commit.ci/run/github/151541/1655293312.ldyU2MwLRgmi2MlTrhAOsw which got as far as codespell (the penultimate check).
More recent runs seem to only get as far as blacken-docs.
There have been no changes to our configuration https://github.com/biopython/biopython/blob/master/.pre-commit-config.yaml for 2 months, so this is not down to a version change in one of the tools.
Looking at a sample of earlier successful runs, the pre-commit run itself (after setup) was apparently taking about 2m10s - but had started to creep up. Perhaps a code or documentation change has hit a performance bottleneck in one of these tools?
Note we are already skipping flake8 itself on this check.
Note pre-commit.ci runs on all the files, it does not attempt to narrow this down to recently changed files (this functionality has been suggested and rejected).
One possible solution may be paying for a longer time out (although this does not seem to be "on the menu" in the GitHub marketplace options).
Running locally (and skipping flake8 as per the current pre-commit.ci setup):
$ rm -rf ~/Library/Caches/black && time SKIP=flake8 pre-commit run -a --verbose
check that executables have shebangs.....................................Passed
- hook id: check-executables-have-shebangs
- duration: 0.09s
check json...............................................................Passed
- hook id: check-json
- duration: 0.08s
debug statements (python)................................................Passed
- hook id: debug-statements
- duration: 2.3s
fix end of files.........................................................Passed
- hook id: end-of-file-fixer
- duration: 0.09s
mixed line ending........................................................Passed
- hook id: mixed-line-ending
- duration: 4.35s
black....................................................................Passed
- hook id: black
- duration: 65.41s
All done! ✨ 🍰 ✨
588 files would be left unchanged.
flake8..................................................................Skipped
- hook id: flake8
blacken-docs.............................................................Passed
- hook id: blacken-docs
- duration: 2.42s
rstcheck.................................................................Passed
- hook id: rstcheck
- duration: 1.18s
doc8.....................................................................Passed
- hook id: doc8
- duration: 1.2s
codespell................................................................Passed
- hook id: codespell
- duration: 0.31s
Enforce https://doi.org/ style...........................................Passed
- hook id: doi-link-style
- duration: 0.22s
real 1m20.538s
user 5m15.117s
sys 0m5.871s
Black is obviously taking most of the time, with debug-statements and mixed-line-ending surprisingly slow.
Is black running on every file, or just the ones changed in the PR?
If running the tool pre-commit locally as a git pre-commit hook, it only checks changed files - so it is very fast.
If running the tool pre-commit online as part of the pre-commit.ci service, it checks all the files, which can be slow. The authors does this intensionally, see e.g. https://github.com/pre-commit-ci/issues/issues/86 (point 5) and https://github.com/pre-commit-ci/issues/issues/90 (explaining in more detail why not).
If I understand the argument on issue 90 referenced, I think it applies more to the usage of black automatically upon committing, instead of on the whole code base. Since we use it just for linting, would it be such an issue to run only on changed files? We could then move pre-commit back to the github actions workflow (since it would be quick fast).
True, we could have pre-commit the tool running on GitHub Actions or where ever. However, that would mean doing something other than pre-commit run -a (run on all files), and instead pre-commit run FILE1 FILE2 ... where we are passing a list of changed files from git. If you're happy to explore that, great.
My understanding was for pre-commit.ci the service is the costs of getting that list of changed files via git/github API calls typically outweighs the benefits - but for our usage that may not be the case?
However, that would mean doing something other than pre-commit run -a (run on all files), and instead pre-commit run FILE1 FILE2 ... where we are passing a list of changed files from git. If you're happy to explore that, great.
I think we had that before actually: https://github.com/biopython/biopython/blob/master/.github/workflows/ci.yml#L26. This action returns the list of changed files only and we can pass that to pre-commit like this.
Actually, I think we are already running pre-commit locally as well, just checked the CI workflow. So maybe we can disable pre-commit.ci?
Huh, we are - https://github.com/biopython/biopython/blob/644f0416e3bc7bcb6bdc331bc42c629b0d36489f/.github/workflows/ci.yml#L54
And that might we quicker to finish (thanks to only doing altered files?)
Likely!
This would be a three step process to do it most smoothly, but probably one PR plus disable the GitHub integration alongside merging is enough:
- Remove pre-commit.ci badge from README
- Disable the GitHub integration
- Drop the "ci" section of
.pre-commit-config.yaml
The wording in CONTRIBUTING.rst is already sufficiently broad - deliberately not naming which CI services we use as that has been in flux the last few of years.
👍 sounds good!
As per discussion on #3967, would be nice to have pre-commit autoaupdate working via GitHub Actions before turning off the pre-commit.ci service.
In the meantime, the recommendation is to skip (almost) all the hooks on pre-commit.ci
I have been using the pre-commit.ci service to apply fixers to pull requests automatically (e.g. pyupgrade, black, reorder_python_imports - example at https://github.com/peterjc/thapbi-pict/pull/473 ).
If we enable that, we would only need the pre-commit.ci server to run those hooks which are also setup as fixers.
While working on that, may I suggest adding isort to the pre-commit hooks? It would standardize import orders on the whole project, separate imports into sections, and list them alphabetically. This is quite useful when dealing with files with several imports.
@valentin994 see #2828 for sorting imports
Lately pre-commit.ci has been hitting the 3m timeout while running black, perhaps a minority of cases, but enough to be frustrating.
Things improved in the last month or so, we're skipping flake8 and rarely time out.