Consistently perform bootstrap and encode Brotli config for improved caching and reduced code complexity
Note to self: consider closing #1177 when this is merged, if it does indeed apply to worker threads -- @cspotcode
Fixes ESM child process forking. Info on the solution can be found in #1814. This also implements #1831.
- This PR always performs the bootstrap through the child entry-point script (regardless of ESM or not).
- This will make the code more readable and obvious, avoiding two different code-paths. \
- If this is undesirable (I don't see any significant downside of doing this always), we can easily avoid the spawn if
--esmis not enabled, and jump directly intophase4/completeBootstrap. - Arguably ESM will become more prominent in the future, as the ecosystem. It seems likely that the child process mechanism is always needed to add flags to the Node runtime.
- If this is undesirable (I don't see any significant downside of doing this always), we can easily avoid the spawn if
- The brotli payload could always contain the full state, but this PR uses TypeScript's type system to strictly enforce that only explicit properties are cached for forked processes.
- This results in reduced bytes for the Brotli-encoded state into
execArgv. - It makes it more safe and an explicit choice what info can be used/cached for forked processes. It would be very easy to accidentally pass through a flag like
--eval. Being explicit and enforcing this, prevents such regressions.
- This results in reduced bytes for the Brotli-encoded state into
cc. @cspotcode. I'm super open to not always go to the child entry-point (when e.g. ESM is not enabled), but I believe this is worth considering as it just makes all of the boostrapping more readable / obvious (while we talk about feature creep and code smell)
Thanks, I'll have to give this some thought.
It's a breaking change to use a child process for all users. It affects PIDs and signal handling and whatnot. And it's a performance regression for users who only need CJS.
Some of us in the ecosystem believe that the child process will eventually be unnecessary, once node adds the appropriate features. https://github.com/nodejs/node/issues/33903 https://github.com/sindresorhus/import-from/issues/9 https://github.com/gulpjs/rechoir/issues/43#issuecomment-1170596886 A heated discussion cropped up a few hours ago: https://github.com/nodejs/node/issues/43818
...so I kinda want to treat the child process as a hack that's only done when absolutely necessary.
To get some of the code quality benefits you describe, I suppose we could pass the brotli payload in-process even when we don't spawn a child. So like, create the payload, and pass it in-memory to the next phase of bootstrapping.
That gives us some of the safety and code readability guarantees you describe without imposing the breaking change / perf hit on CJS consumers.
@cspotcode Yeah, it's trivial to just jump into the final phase in-memory (from the same process). Good points you raised there. I will make this change and rebase the PR. Should be ready for your review/thoughts tomorrow.
Codecov Report
Merging #1836 (9c0a5a9) into main (97f9afd) will decrease coverage by
0.00%. The diff coverage is93.58%.
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/child/spawn-child-with-esm.ts | 82.35% <82.35%> (ø) |
|
| src/bin.ts | 88.29% <95.74%> (-0.17%) |
:arrow_down: |
| src/child/child-entrypoint.ts | 87.50% <100.00%> (-5.36%) |
:arrow_down: |
| src/child/child-exec-args.ts | 100.00% <100.00%> (ø) |
|
| src/child/argv-payload.ts | 83.33% <0.00%> (-16.67%) |
:arrow_down: |
Only skimming the code, so I may have misinterpreted. But FWIW it's ok to make the breaking change to stop resolving entrypoint relative to our --cwd. Not sure if that simplifies things.
Also feel free to push a long commit history, no need to force-push if you don't want to. We will squash merge it to main anyway.
@cspotcode Thanks for looking. Sounds good. I will not squash the fixup commits (just a habit of other repos). Regarding --cwd. It's low-effort to keep it working as part of this PR, but it will be easy to remove this in a follow-up. thx
Sounds good. FWIW I'm going to start merging breaking changes to main, so we're fully in breakage mode.
Nice! just for your information: This PR is now green and ready for review when you find some time. No rush, thanks!
Thanks, can I get a quick bulleted list of the changes and their motivation, to assist review? In another comment, or by updating the issue description?
ts-node --esm avoids loading the typescript compiler in the parent process, which is good for performance. Loading TS compiler is slow. Does this PR preserve that optimization?
@cspotcode This PR:
- Cleans up the
BootstrapStatefor the new approach, removing all the detection flags of whether we are in the CLI. - Introduces a new type for the bootstrap state that can be encoded into the Brotli config.
- This is a subset of the bootstrap state with explicit fields that can be cached
- This makes it obvious which stuff can be cached/encoded, and also encourages thinking before new stuff is passed through.
- Also it reduces the bytes encoded in the payload
-
getEntryPointInfo(coming from the previous ESM fix PR) no longer takes the full bootstrap state. It just fine-grained takes what it needs. This is needed because only the necessary properties/state is cached, so it cannot expect the full parsed arguments (see 2) - As we chatted, we now always encode the Brotli state (regardless of ESM or with ESM) into
process.execArgv. This ensures that phase1-3 do not need to be repeated when forked (and also makes the mental model easy and more expected).- A shared function is used for the child process script and
execArgvcomputation. This logic can be used for settingexecArgvand for getting the args for going directly frombin.jsinto the child process when ESM is needed.
- A shared function is used for the child process script and
- The existing
index.spec.tsforking recursiveexecArgvtest had to be updated:- So that is now knows that forked processes always go to the child entry-point with Brotli config (see 4)
- So that we also test the forking
execArgvfor ESM (aside from the worker actual fork tests we have)
- Some of the forking tests I added with my first PR have been renamed and adjusted to not rely on
--cwdbut just use the working directory when spawningts-node(like cd dir && ts-node).
is this sufficient information?
ts-node --esm avoids loading the typescript compiler in the parent process, which is good for performance. Loading TS compiler is slow. Does this PR preserve that optimization?
This is something I'm not sure if worth doing. It would introduce some complexity again, for something that you also acknowledged to be just a workaround in the long-term. I do realize that loading TS is usually a little slow, but is it that big of a deal, people noticing in the meanwhile? Also because it already loads twice if people enable esm in the tsconfig. The complexity will be that we now would also need to pass all parseArgv result into the initial child entry-point (needed for phase3), and so on..
@cspotcode Please find the bullet-point list you requested in the above comment. I also added the fast-path to avoid loading the TS compiler when we detect --esm before loading the tsconfig. I was able to actually re-add this without making the code more complicated as I reworked some of the types/structs I created with the initial commits. See the latest fixup.
I also rebased on top of your breaking changes in main
Hey @cspotcode, just wanted to check if there is any way I can help move this forward. No rush, just a friendly check-in
Friendly ping on this again. Happy to make any changes. also happy to discuss this if this doesn't look reasonable.
It's been a while. I'm ready to rebase this, or close this, but would like to get it off my plate. Please give an update.
@cspotcode It's unfortunate that I need to close this— due to no response at all. It would have been totally fine to say that it's not worth the review effort given lack of time etc..
In general- I think the current code in main could definitely benefit from some improved type-safety and avoiding unnecessary work when working. At least we were able to merge in https://github.com/TypeStrong/ts-node/pull/1814 to fix forking separately.