ci: parallelize the mdbook build <language> process
Make the mdbook build process parallel. This is a simple approach by just cloning the entire repository into new directories and running the process in parallel. This is a demonstration for #2767.
The process appears to be mostly CPU bound and not memory or disk heavy so the amount of cores on the CI runner matter. mdbook build does not seem to use more than 1 core and a typical github CI machine has 4 cores. So the expected speedup would be 4x.
Currently 30 minutes are used for publish, 25 minutes of this is build process, this would be reduced to 5 + 6.25 = ~12 minutes, thus more than halving the process.
Down from 25 minutes to 11 minutes. Less than expected, but also less than half the previous run time
Nice!!
This now finished (even with missing rust caches!) in 21 minutes. https://github.com/google/comprehensive-rust/actions/runs/15612132409 With caching this would get down quite a lot.
@mgeisler can you review the comprehensive-rust-all artifact if that is the correct structure? This would show that uploading the intermediate steps and artifact downloading with merge works as expected.
The workflow still needs to be refactored a little bit as there is quite some duplication now but this only serves as a POC as of now
@djmitche @mgeisler any thoughts on the latest approach?
I like this approach, but I admit I don't know much about GitHub actions so if there are any subtle gotcha's I would not be the person to detect them.
I think this is a great approach, well done, @michael-kerscher! Originaly, we only had a simple mdbook build, which is very fast. But now, I guess the PDF generation is what slows things down dramatically? So it makes sense to do this in parallel.
As for the caching, if building mdboo (and other binaries) takes a long time, we could build them just once, cache them, and reuse them in each job going forward. I forget if we're able to run 20+ jobs in paralle, or if they run, say 4 or 8 at a time? If it's the latter, then building mdbook once could help here
With #2915 merged, I was able to remove the extra build step, creating and downloading artifacts again. The change is now reduced to the split into
- create-translation: produces a language artifact with the build.sh
- publish: loads all language artifacts, does the report and publishing of everything.
the previous sequential build of the language is now using individual jobs to do this in parallel and it looks way better.
Current state: For maximum efficiency we still (according to the current builds) need binaries for
-
mdbook-svgbob
-
i18n-report
-
mdbook-linkcheck2 requires some additional thinking: e.g. build a binary in this repository so it is available for cargo binstall.
and we could shave some minutes of the process by caching the installation that is pretty slow e.g. with https://github.com/marketplace/actions/install-and-cache-apt-tools
and we could shave some minutes of the process by caching the installation that is pretty slow e.g. with https://github.com/marketplace/actions/install-and-cache-apt-tools
Ah, fun, I had a similar idea today and put #2916 up for review. Looking over the code, I don't see how this can be quicker than downloading things from the (internal) Apt mirrors GitHub maintains.
My thinking with regards to caching is that it should be used when the thing being cached takes a significant amount of time to create. Unpacking binaries from the GitHub cache or from an Apt mirror ought to be comparable in time.
In #2916, I found that most time was spent on updating the man page database... So telling dpkg to not install documentation helped a ton. My guess is that this could be why the GH Action you link to exists.
and we could shave some minutes of the process by caching the installation that is pretty slow e.g. with https://github.com/marketplace/actions/install-and-cache-apt-tools
Ah, fun, I had a similar idea today and put #2916 up for review. Looking over the code, I don't see how this can be quicker than downloading things from the (internal) Apt mirrors GitHub maintains.
My thinking with regards to caching is that it should be used when the thing being cached takes a significant amount of time to create. Unpacking binaries from the GitHub cache or from an Apt mirror ought to be comparable in time.
In #2916, I found that most time was spent on updating the man page database... So telling
dpkgto not install documentation helped a ton. My guess is that this could be why the GH Action you link to exists.
Exactly that was my thought, installing the packages did something with the content of the packages and processing the documentation. You approach is even better as there is no use for documentation in the CI environment (at least not yet, AI is not embedded deep enough that it requires that documentation :D)