Tracking issue: compile cache
Follow up to https://github.com/nodejs/node/issues/47472 . Some items that can be investigated:
- [ ] Exposing an API for user code to control the caching https://github.com/nodejs/node/pull/52535#issuecomment-2059390083
- [ ] Idle-time cache serialization like what Blink does, to avoid penalizing the first load
- [ ] Other hashing algorithm (CRC32 may be good enough for our use case. In the initial implementation, it was chosen because it can be used on no-crypto builds and fast enough. For reference, ccache has used md4 and later BLAKE2b -> BLAKE3)
- [ ] Other directory layout (splitting the cache for each file and read on the fly seems to be fast enough and I don't really see I/O showing up in the profile anyway) or using a db (if/when we implement Web Storage?)
- [ ] Embedder API for configuring the storage
- [ ] Inode caching like https://github.com/ccache/ccache/pull/577 (note that CRC32 also barely shows up in the profile, it may not worth the complexity).
- [ ] Avoid UTF8 transcoding by directly reading the source code as buffer from disk (this needs to dance with CJS loader monkey patching)
Avoid UTF8 transcoding by directly reading the source code as buffer from disk (this needs to dance with CJS loader monkey patching)
FWIW I think loaders/require hooks are probably very common in dev and pretty rare in production (where compile cache has the most value) but that's just intuition.
and pretty rare in production
I would think it's the opposite for tracing agents - although they usually don't care about the source code (except the current loaders built on top of the off-thread hooks like import-in-the-middle that are forced to do a hacky analysis of the source code, which is why I am proposing a in-thread link() hook for them in https://github.com/nodejs/loaders/pull/198 to not have to do this).
Also, speaking of loader hooks, I think we need to convert the CJS loader to pass buffers around regardless for future binary file loading support (for example if the custom loader wants to support loading wasm, or zip, or anything that's not stored as uh, bytes encoded in UTF8 on disk).
(Now I am spamming this tracking issue but) after some looks into existing monkey patching usages in popular packages (or I did a GitHub code search) I think the most prioritized item should be an API for packages to turn this on programmably. I don't have a great idea about how this API should look like though, so ideas welcomed. (Maybe process.enableCompileCache(dir) with some re-entrancy guards would be good enough, or maybe it's a terrible idea to make it per-thread because packages can step on each other's toes?).
Other hashing algorithm
xxhash by the creator of ztsd seems a good no-crypto hash algorithm.