Replace `glob` with `compare-changes`
What / Why
- Fix https://github.com/mpalmer/action-validator/issues/105
- and https://github.com/mpalmer/action-validator/issues/27
How
I had some free time on my hands so I spent Nov 21 to Nov 30 on building a Rust library called compare-changes that is a reimplementation of GitHub's workflow syntax.
While setting up a trivial parser/pattern matcher is easy enough, one will soon start to learn about things like memoization, backtracking, pathological patterns, automatas and such, which are interesting, but would take me ages to use / account for, so instead the library relies on chumsky and regex for the difficult runtime bits.
Overview of compare-changes
- It is a library intended to replace
globforaction-validator - It is a CLI / a GitHub Action that replaces my previous (poorly implemented) Python-based compare-changes. It is not that Python is bad (arguably better when targeting GitHub Actions), it is that it was a quick hack that worked well enough that until now wasn't worth improving.
and how the library works:
- lib.rs defines the public API, and does some input normalization on the provided path string [L41-45] to match GitHub's behavior.
- path/mod.rs uses chumsky to parse the input strings into Rust structs/enums so the data is easier to work with down the line. It also performs some validations on e.g. brackets and such. An alternative would have been nom but I understood chumsky's a bit better at emitting errors. Most of the integration tests focus on validation errors.
Chumsky is a parser library for Rust that makes writing expressive, high-performance parsers easy.
- convert/mod.rs ingests the Rust structs/enums to build regex patterns to match the Github Actions workflow syntax filter patterns spec. It has (imho) lots of unit tests at convert/test.rs and it even performs some validation on the patterns by emitting errors on invalid see L300-324.
While GitHub's Documentation is helpful, it doesn't cover various edge cases of how pattern validation should behave. Therefore I've used various temporary repositories to run dummy workflows to see the actual behavior and eventually settled on the use of anttiharju/tmp for the purpose.
Now while I am quite happy with the state of the library, it of course could be that I've missed some edge cases and such, even if I am aware of a decent chunk of them. In case people encounter further issues with action-validator in regards to path patterns, feel free to point them to open an issue in anttiharju/compare-changes. These are fairly easy to provide repros for and it should be easy enough for me to update the library.
Testing
Library has plenty of automated tests. Also tested manually:
- Usage as a Git hook (via Lefthook) still catches glob issues in pre-commit
- Works well in a monorepo (nixpkgs). Since the difficult runtime bits are handled by other crates I didn't expect issues here, but a hand-written implementation could have struggled.
Benchmarks
Main motivator for me here was fixing the slow Git hook so here are some numbers from a repo that uses a lot of glob patterns (anttiharju/[email protected]).
worst case: from 7.883 s to 118.5 ms, a 66x improvement (7883/118=66,8050847458)
best case: from 92.4 ms to 119.0 ms, a 26.6 ms overhead
the multiplier may change a lot depending on the repo picked for running the benchmarks. In the performance issue discussion the multiplier was 19x [comment] because the worst case was not as bad. I did check that the numbers here were from 0.8.0 and not 0.6.0, with 0.6.0 worst case went up to ~10s.
Benchmarks ran on a 2021 14" MBP using hyperfine:
hyperfine --warmup 3 "git ls-files -z '.github/workflows/*.yml' '*/action.yml' | xargs -0 action-validator --verbose"
details
worst-case
with a gitignored .direnv from nix-direnv
action-validator@27-fix-wildcards-105-fix-gitignore-perf
Time (mean ± σ): 118.5 ms ± 0.7 ms [User: 80.5 ms, System: 36.4 ms]
Range (min … max): 117.3 ms … 120.2 ms 23 runs
[email protected]
Time (mean ± σ): 7.883 s ± 0.137 s [User: 0.814 s, System: 7.062 s]
Range (min … max): 7.717 s … 8.195 s 10 runs
best case
no gitignored files ensured with
git reset --hard && git clean -dfx
action-validator@27-fix-wildcards-105-fix-gitignore-perf
Time (mean ± σ): 119.0 ms ± 1.0 ms [User: 80.6 ms, System: 36.4 ms]
Range (min … max): 117.8 ms … 121.3 ms 24 runs
[email protected]
Time (mean ± σ): 92.4 ms ± 0.8 ms [User: 66.6 ms, System: 26.2 ms]
Range (min … max): 91.3 ms … 94.3 ms 30 runs
Anything else
- Noticed that some tests (
tests/fixtures/012_github_glob_syntax/glob.yml) did not match GitHub's behavior, so went ahead and fixed them. -
action-validatornow shells out togitforls-filesto solve the performance issue when a repository has a lot of gitignored files.
Comments on the code in the library (lib.rs, path/mod.rs, convert/mod.rs) are also welcome, but not expected.