biopython pre-commit.ci time outs (just over 3m)

Recently many pull requests and some of the checks on the main branch are failing at pre-commit.ci due to time outs, we appear to now slip over the 3m limit.

I think the first master branch failure was https://github.com/biopython/biopython/commit/e33e229f07bcc4ffdcbb3b1ff21cbc93acd88137 - see https://results.pre-commit.ci/run/github/151541/1655293312.ldyU2MwLRgmi2MlTrhAOsw which got as far as codespell (the penultimate check).

More recent runs seem to only get as far as blacken-docs.

There have been no changes to our configuration https://github.com/biopython/biopython/blob/master/.pre-commit-config.yaml for 2 months, so this is not down to a version change in one of the tools.

Looking at a sample of earlier successful runs, the pre-commit run itself (after setup) was apparently taking about 2m10s - but had started to creep up. Perhaps a code or documentation change has hit a performance bottleneck in one of these tools?

Note we are already skipping flake8 itself on this check.

Note pre-commit.ci runs on all the files, it does not attempt to narrow this down to recently changed files (this functionality has been suggested and rejected).

One possible solution may be paying for a longer time out (although this does not seem to be "on the menu" in the GitHub marketplace options).

Jun 17 '22 08:06 peterjc

Running locally (and skipping flake8 as per the current pre-commit.ci setup):

$ rm -rf ~/Library/Caches/black && time SKIP=flake8 pre-commit run -a --verbose
check that executables have shebangs.....................................Passed
- hook id: check-executables-have-shebangs
- duration: 0.09s
check json...............................................................Passed
- hook id: check-json
- duration: 0.08s
debug statements (python)................................................Passed
- hook id: debug-statements
- duration: 2.3s
fix end of files.........................................................Passed
- hook id: end-of-file-fixer
- duration: 0.09s
mixed line ending........................................................Passed
- hook id: mixed-line-ending
- duration: 4.35s
black....................................................................Passed
- hook id: black
- duration: 65.41s

All done! ✨ 🍰 ✨
588 files would be left unchanged.

flake8..................................................................Skipped
- hook id: flake8
blacken-docs.............................................................Passed
- hook id: blacken-docs
- duration: 2.42s
rstcheck.................................................................Passed
- hook id: rstcheck
- duration: 1.18s
doc8.....................................................................Passed
- hook id: doc8
- duration: 1.2s
codespell................................................................Passed
- hook id: codespell
- duration: 0.31s
Enforce https://doi.org/ style...........................................Passed
- hook id: doi-link-style
- duration: 0.22s

real	1m20.538s
user	5m15.117s
sys	0m5.871s

Black is obviously taking most of the time, with debug-statements and mixed-line-ending surprisingly slow.

Jun 17 '22 09:06 peterjc

Is black running on every file, or just the ones changed in the PR?

Jun 17 '22 15:06 JoaoRodrigues

If running the tool pre-commit locally as a git pre-commit hook, it only checks changed files - so it is very fast.

If running the tool pre-commit online as part of the pre-commit.ci service, it checks all the files, which can be slow. The authors does this intensionally, see e.g. https://github.com/pre-commit-ci/issues/issues/86 (point 5) and https://github.com/pre-commit-ci/issues/issues/90 (explaining in more detail why not).

Jun 17 '22 15:06 peterjc

If I understand the argument on issue 90 referenced, I think it applies more to the usage of black automatically upon committing, instead of on the whole code base. Since we use it just for linting, would it be such an issue to run only on changed files? We could then move pre-commit back to the github actions workflow (since it would be quick fast).

Jun 17 '22 17:06 JoaoRodrigues

True, we could have pre-commit the tool running on GitHub Actions or where ever. However, that would mean doing something other than pre-commit run -a (run on all files), and instead pre-commit run FILE1 FILE2 ... where we are passing a list of changed files from git. If you're happy to explore that, great.

My understanding was for pre-commit.ci the service is the costs of getting that list of changed files via git/github API calls typically outweighs the benefits - but for our usage that may not be the case?

Jun 18 '22 17:06 peterjc

However, that would mean doing something other than pre-commit run -a (run on all files), and instead pre-commit run FILE1 FILE2 ... where we are passing a list of changed files from git. If you're happy to explore that, great.

I think we had that before actually: https://github.com/biopython/biopython/blob/master/.github/workflows/ci.yml#L26. This action returns the list of changed files only and we can pass that to pre-commit like this.

Jun 18 '22 18:06 JoaoRodrigues

Actually, I think we are already running pre-commit locally as well, just checked the CI workflow. So maybe we can disable pre-commit.ci?

Jun 21 '22 16:06 JoaoRodrigues

Huh, we are - https://github.com/biopython/biopython/blob/644f0416e3bc7bcb6bdc331bc42c629b0d36489f/.github/workflows/ci.yml#L54

And that might we quicker to finish (thanks to only doing altered files?)

Jun 21 '22 20:06 peterjc

Likely!

Jun 21 '22 21:06 JoaoRodrigues

This would be a three step process to do it most smoothly, but probably one PR plus disable the GitHub integration alongside merging is enough:

Remove pre-commit.ci badge from README
Disable the GitHub integration
Drop the "ci" section of .pre-commit-config.yaml

The wording in CONTRIBUTING.rst is already sufficiently broad - deliberately not naming which CI services we use as that has been in flux the last few of years.

Jun 22 '22 08:06 peterjc

👍 sounds good!

Jun 22 '22 23:06 JoaoRodrigues

As per discussion on #3967, would be nice to have pre-commit autoaupdate working via GitHub Actions before turning off the pre-commit.ci service.

In the meantime, the recommendation is to skip (almost) all the hooks on pre-commit.ci

Jul 07 '22 10:07 peterjc

I have been using the pre-commit.ci service to apply fixers to pull requests automatically (e.g. pyupgrade, black, reorder_python_imports - example at https://github.com/peterjc/thapbi-pict/pull/473 ).

If we enable that, we would only need the pre-commit.ci server to run those hooks which are also setup as fixers.

Jul 07 '22 11:07 peterjc

While working on that, may I suggest adding isort to the pre-commit hooks? It would standardize import orders on the whole project, separate imports into sections, and list them alphabetically. This is quite useful when dealing with files with several imports.

Jul 14 '22 08:07 valentin994

@valentin994 see #2828 for sorting imports

Jul 14 '22 09:07 peterjc

Lately pre-commit.ci has been hitting the 3m timeout while running black, perhaps a minority of cases, but enough to be frustrating.

Sep 09 '22 14:09 peterjc

Things improved in the last month or so, we're skipping flake8 and rarely time out.

Nov 15 '22 15:11 peterjc