[rush] Proposal: Rush mitigations for postinstall failures
Summary
This is a proposal for two Rush changes to improve the developer experience in a large monorepo where postinstall scripts often fail:
-
Faster retries by separating the install and postinstall operations, so that postinstall can be retried without redoing
pnpm install -
Avoid unnecessary postinstalls by exposing PNPM's
--ignore-scriptsparameter forrush installandrush update. (When is a postinstall "unnecessary"? For example you need to install that package as part ofrush update --full, but you aren't going to compile any projects that use it.)
(The term "postinstall" in this proposal includes multiple operations; see clarification below.)
Background
My employer has a large monorepo where postinstall fails unusually often. The root causes:
- the monorepo has lots of recently imported projects with different tech stacks and versions, so the number of NPM packages with postinstall is unusually large
- we also have multiple PNPM lockfiles, causing the same postinstall to run multiple times
- CI jobs run in geolocations with unreliable network connections to USA CDNs -- postinstall scripts often download large binaries without retry mechanisms, and without providing a way to redirect their URL to a geolocated mirror
A full rush install of the monorepo does 3 kinds of work:
- PNPM validates the lockfile, and in the case of
rush updatePNPM calculates an updated lockfile - PNPM downloads tarballs from the NPM registry and extracts them to make
node_modulesfolders - Lastly, PNPM runs the
"install","postinstall","prepare", etc lifecycle scripts for some NPM package dependencies. Typically these "postinstalls" download assets and/or invoke native compilers.
If step 3 fails, then we need to retry rush install repeatedly, for example to reattempt a failed download, or to find out if apt-get install has fixed a compile failure. Each iteration redoes all three steps from the beginning, which is painful because steps 1+2 are very slow.
What PNPM implements
PNPM already provides a mechanism for separating install+postinstall. It works like this:
# Install everything but skip postinstall
pnpm install --ignore-scripts
# Run the postinstall scripts for packages whose postinstall
# has not already completed successfully
pnpm rebuild --pending
This "pending" status is tracked in ./node_modules/.modules.yaml in a pendingBuilds field, for example:
. . .
packageManager: [email protected]
pendingBuilds:
- /core-js-pure/3.23.1
- /es5-ext/0.10.61
- /node-sass/7.0.1
- /encoding/0.1.13
- /node-sass/7.0.0
prunedAt: Tue, 12 Jul 2022 03:27:29 GMT
Proposed Solution
The solution has two parts:
Part A: Faster retries
When you run rush install, it invokes PNPM something like this:
~/.rush/node-v14.20.0/pnpm-6.29.1/node_modules/pnpm/bin/pnpm.cjs \
install --store ~/code/rushstack/common/temp/pnpm-store \
--frozen-lockfile \
--reporter ndjson \
--strict-peer-dependencies \
--recursive \
--link-workspace-packages false
Let's ALWAYS add --ignore-scripts to the above command line, and call that Stage 1. After it completes, we run Stage 2 which is pnpm rebuild --pending. This way if a postinstall script fails, then Rush can skip Stage 1 and go straight to Stage 2.
To be clear, if you invoke rush install or rush update without --ignore-scripts, we're proposing that PNPM will still be invoked with --ignore-scripts:
rush install
# Stage 1
pnpm install --ignore-scripts
# Stage 2
pnpm rebuild --pending
rush install --ignore scripts
# Stage 1
pnpm install --ignore-scripts
# (Stage 2 is skipped)
Part B: Avoid unnecessary postinstalls
Suppose that project X depends on a package D whose postinstall would fail on my computer. I don't want to fix this failure. (For example maybe the fix involves compiling Python 3 for my old Debian 9 box, which is pointless if my local work won't use this NPM package, or if I can rely on a CI job to build that project.)
- If I want to work on project
Ythat does NOT depend onD, then a subset install will avoid the problem:rush install --to Y - However as part of working on project
X, maybe I need to runrush update --fullfor the entire monorepo. This will need to installD. But the postinstall for D is NOT relevant to my work, because this operation only cares about versions/lockfiles. I will not compile projectY.
Let's expose --ignore-scripts to rush update:
# Regenerate the lockfile for the entire monorepo,
# package D gets installed but not postinstalled
rush update --full --ignore-scripts
# Perform postinstall for dependencies of Y so that I
# can compile Y. (But do NOT postinstall D because I
# will not compile X.)
rush install --to Y
Design considerations
-
In a world where postinstall always succeeds, deferring
pnpm rebuildto Stage 2 might be slightly slower, because it cannot run in parallel with other downloads. (Does PNPM actually parallelize that?) Maybe we should provide a setting to opt-in to this two-stage install? Something likerobustPostinstalls=true -
What about NPM and Yarn? It seems that neither Yarn Classic nor NPM implement the
pnpm rebuild --pendingfunctionality. Yarn Plug'n'Play does implement it, but is not yet supported by Rush. After some consideration, we're okay with this being a PNPM-only feature for now. This proposal doesn't take away any functionality for Yarn/NPM nor does it enable any new scenarios; it merely provides a tool to improve reliability of existing operations. -
How does
rush installknow whether postinstall has run already? We could store this state in a file similar tolast-install.flag. We could also parse./node_modules/.modules.yamland check whetherpendingBuildsis empty, but it would be better to avoid reliance on PNPM internals. -
How does
rush install --to Xknow whether postinstall has run for the subset of dependencies needed by X? A few possibilities: a. Rush could analyze the transitive dependencies of X. b. Rush could spawnpnpm build --pending --filter ...Xevery time? This will regress the execution time for an already up to daterush install --to X, but maybe that is negligible? c. We could disable the two-stage installation in the case of subset installs.
CC @chengcyber
Clarification: In this proposal, the term "postinstall" used as a general term for invoking several different lifecycle scripts (preinstall, install, postinstall, prepublish, prepare) and also recreating the node_modules/.bin scripts.
This seems useful to me, but wouldn't Part B be better handled by rush install --to <project you're working on> to avoid installing the problematic dependency at all?
This seems useful to me, but wouldn't Part B be better handled by
rush install --to <project you're working on>to avoid installing the problematic dependency at all?
@iclanton In the example, the person cannot solve it with rush install. Here's a more concrete scenario:
-
I need to upgrade the
jest-canvas-mockdependency for projectYbecause it has a new API that I need for my unit tests -
But
ensureConsistentversions requires me to upgrade all the other projects in the monorepo to have a consistent version. -
rush updatefails because the postinstall failed foroptipng-binused by projectX. The failing code looks like:try { // From https://sourceforge.net/projects/optipng/files/OptiPNG/ await binBuild.file(path.resolve(__dirname, '../vendor/source/optipng.tar.gz'), [ `./configure --with-system-zlib --prefix="${bin.dest()}" --bindir="${bin.dest()}"`,🤦♂️ 'make install'🤦♂️ ]); console.log('optipng built successfully'); } catch (error) {
It could take 30 minutes to track down the right prerequisites to make that succeed on my VM. But that's irrelevant to my work -- an upgrade of jest-canvas-mock is unlikely to break project Y, and if it did happen we'll find out when the CI job builds it.
Thus, I need rush update to regenerate the lockfile, whereas the postinstall of optipng-bin is irrelevant.
(Ideally we need to eliminate optipng-bin/lib/install.js from our build, but... one step at a time.🙃 )