Component Size & Performance: JCO/SpiderMonkey vs QuickJS
Hi there,
Let me start by saying that I'm missing a lot of context and that my approach is extremely naive, so please tell me to bugger off. Yet, I would still like to better understand the rational or state-of-the-world.
I'm playing around with various WASI components, mostly in Rust and JS. I did certainly notice the chunky component size of my JS components whenever wasmtime had to recompile them. I didn't think too much of it, thinking that's just the price to pay for bundling an interpreter. However, eventually I built a rust WASI component bundling QuickJS (via the rquickjs crate), which turned out to be slightly faster and significantly smaller. For comparison:
- JCO => 13M
- JCO + weval/aot => 29M
- Rust + QuickJS => 1.9M
Running a naive fibonacci implementation just to get a sense of performance, I get for fib(40):
- JCO => 45s
- JCO + weval/aot => 29s
- Rust + QuickJS => 26s
And startup codgegen/compile times are hugely different roughly propotional to the difference in component size.
I've no clue if this actually SpiderMonkey or if there's something else going on, however I didn't expect the difference to be quite so substantial. Could you help me understand what I'm missing or is this an opportunity?
Thanks, Sebastian
EDIT: I'm happy to back this up with code-examples, I mostly just wanted to reach out and see if this is something you're aware off or have seen before.
Unfortunately this isn't an opportunity, exactly. At least not a quick and easy one.
I'll tackle the size and performance questions separately, but before that: the key reason for choosing SpiderMonkey instead of QuickJS is that in my opinion QuickJS, while an impressive piece of technology for what it is, is not a good basis for a production JS runtime. Before building the precursor to StarlingMonkey, I built a runtime based on QuickJS, and eventually abandoned it. There were some performance reasons—see below—but I would've made the same decision even if QuickJS had been faster at whatever I tried.
The key reason is that it's exceptionally hard to use QuickJS's embedding API without introducing very subtle memory safety issues, and that I[^1] don't think the engine itself should be deeply trusted in this regard. There are lots of use cases for which that's entirely fine inside a wasm sandbox, but we're not only targeting those.
Regarding performance: there are indeed a number of kinds of workload that QuickJS is faster on than SpiderMonkey when compiled to Wasm. However, one thing we realized back when we made this decision is that there are other cases where QuickJS is fundamentally slower: one of the things the hundreds of person-years that companies poured into competitive JS engines over the last couple of decades lead to is that they have a lot of optimizations to avoid exceedingly slow paths. I don't remember the details anymore (this all having been about 4 years ago), but we encountered scenarios where QuickJS had roughly O(n^2) performance for a given input length, while SpiderMonkey had O(n).
I guess the TL;DR for all this is that there's a wide gulf between JS engines that are exposed to the hostile security environment that is the web and that partook in the browser performance wars, and those that didn't.
On file size: SpiderMonkey components would always be meaningfully larger than QuickJS ones, but it's also a simple reality that nobody ever invested any time at all into trying to shrink them. It's quite simply never been high enough on our list of priorities. There would be lots of easy wins to be had for sure: the build configuration for SpiderMonkey includes lots of stuff that is definitely dead code, but in ways the compiler can't readily tell. To give just two examples, there's lots of stuff in there for dealing with JIT compilation, even though we don't support that at all; and there's also lots of Wasm related stuff in there, even though we can't run Wasm in SpiderMonkey in Wasm :)
And startup codgegen/compile times are hugely different roughly propotional to the difference in component size.
I don't know what your use case looks like, but one thing to check is if you can make use of pre-compilation and pre-instantiation. The combination of these should fully mitigate these issues if you make can make use of them. (The docs I linked to are for Wasmtime, but it's possible that other runtimes would have similar optimizations.)
^1: I should note that I used to be on the SpiderMonkey team, so I'm not entirely an objective party in this assessment. At the same time it also means that it's a somewhat informed assessment.
Thanks for the detailed response, much appreciated 🙏
Unfortunately this isn't an opportunity, exactly. At least not a quick and easy one.
Nobody said easy :hide: 😅
The key reason is that it's exceptionally hard to use QuickJS's embedding API without introducing very subtle memory safety issues, and that I[^1] don't think the engine itself should be deeply trusted in this regard. There are lots of use cases for which that's entirely fine inside a wasm sandbox, but we're not only targeting those.
That's exactly the context I was hoping for. I've no reason to doubt your expert assessment as to what solution provides the best trade-offs.
one of the things the hundreds of person-years that companies poured into competitive JS engines over the last couple of decades lead to is that they have a lot of optimizations to avoid exceedingly slow paths.... I guess the TL;DR for all this is that there's a wide gulf between JS engines that are exposed to the hostile security environment that is the web and that partook in the browser performance wars, and those that didn't.
Maybe, not including all the meetings, OKRs and perf cycles. I don't understand QuickJS well enough to argue either way. I believe in the magic of driven ICs, otherwise I'd have to switch to Windows 😅 . Jokes aside, I'm very interested in the technical arguments. If QuickJS has sharp, pathological edges, that may be a strong reason against a default pick.
(Just a random shout from the side-lines w/o pushing anything: have you ever looked at: https://github.com/facebook/hermes ? FWIW, I understand that any change would be expensive and there would need to be a compelling reason especially if the SpiderMonkey setup still has plenty of room for optimizations)
On file size: SpiderMonkey components would always be meaningfully larger than QuickJS ones, but it's also a simple reality that nobody ever invested any time at all into trying to shrink them. It's quite simply never been high enough on our list of priorities. There would be lots of easy wins to be had for sure: the build configuration for SpiderMonkey includes lots of stuff that is definitely dead code, but in ways the compiler can't readily tell. To give just two examples, there's lots of stuff in there for dealing with JIT compilation, even though we don't support that at all; and there's also lots of Wasm related stuff in there, even though we can't run Wasm in SpiderMonkey in Wasm :)
That sounds very interesting! For context, let me quickly address the suggested pre-compilation...
I don't know what your use case looks like, but one thing to check is if you can make use of pre-compilation and pre-instantiation. The combination of these should fully mitigate these issues if you make can make use of them. (The docs I linked to are for Wasmtime, but it's possible that other runtimes would have similar optimizations.)
Agreed, this can greatly help with things like prod startup times.
Maybe as a less suitable use-case: I'm also looking at a hot-restart development flow. Imagine editing a file, a file-watcher picking it up, repackaging the component and then signalling a wasmtime runtime to pick up the new-component, effectively recompiling it. At the moment, this would be 10s of seconds vs ~1s with QuickJS. If a SpiderMonkey, or more generally JS, component could shed some weight, this would help a great deal.
Slimming the current setup, is that something that could be tangible?
(Just a random shout from the side-lines w/o pushing anything: have you ever looked at: facebook/hermes ?
I have, yes: it targets ES6 plus some extensions, not the latest JS standards as SpiderMonkey does. I didn't look closely into the embedding API, so I can't comment on how easy or hard it'd be to implement all the additional builtins we provide.
Which is something I didn't emphasize in my last reply: embedding an engine to just execute some code and provide a trivial custom API to JS is very different compared to implementing a whole bunch of additional functionality as we do in StarlingMonkey. That's where differences in embedding APIs really become important.
Maybe as a less suitable use-case: I'm also looking at a hot-restart development flow. Imagine editing a file, a file-watcher picking it up, repackaging the component and then signalling a wasmtime runtime to pick up the new-component, effectively recompiling it. At the moment, this would be 10s of seconds vs ~1s with QuickJS. If a SpiderMonkey, or more generally JS, component could shed some weight, this would help a great deal.
I'm curious to hear what your development environment looks like. I use just this kind of workflow on a daily basis, and on my (M2 Macbook Pro) laptop it's about a couple of seconds—certainly not 10s of seconds.
That being said, we certainly should do way better on this—and we can: StarlingMonkey itself already has a runtime-loading mode where you create a component that loads JS at instantiation time, instead of it being baked into a snapshot. That's something we can enable for ComponentizeJS as well, though it's not entirely trivial.
Oh, and to clarify: that mode makes it so that no compilation is happening at all during the development cycle: you edit your code and send a new request, which always loads the latest version of the code, without the runtime itself being recompiled.
Which is something I didn't emphasize in my last reply: embedding an engine to just execute some code and provide a trivial custom API to JS is very different compared to implementing a whole bunch of additional functionality as we do in StarlingMonkey. That's where differences in embedding APIs really become important.
Understood.
I'm curious to hear what your development environment looks like. I use just this kind of workflow on a daily basis, and on my (M2 Macbook Pro) laptop it's about a couple of seconds—certainly not 10s of seconds.
My notebook rocks a decently crisp 7840u (8 cores/ 16 threads).
My flow:
- build inputs from local JS + deps using vite + tsc: 91ms
-
jco componentize dist/index.mjs -w ../../guests/typescript/wit -o dist/component.wasmtakes around 4.2s - Using wasmtime with parallel compilation enabled (admittedly optimzed for speed) to compile the 13MB wasm component takes 53s
Oh, and to clarify: that mode makes it so that no compilation is happening at all during the development cycle: you edit your code and send a new request, which always loads the latest version of the code, without the runtime itself being recompiled
Sounds like wasmtime has an interpreted mode? That would certainly help with the biggest chunk.
- Using wasmtime with parallel compilation enabled (admittedly optimzed for speed) to compile the 13MB wasm component takes 53s
That is way way way longer than it should be. Would you mind filing a bug about it over on Wasmtime? The more info, the better, with the ideal of course being full steps to reproduce and/or a wasm file to test with.
Sounds like wasmtime has an interpreted mode? That would certainly help with the biggest chunk.
It does, as well as a baseline compiler. But that's not even what I mean: if ComponentizeJS had a way to create a wasm component that contains the generated WIT bindings, but not any guest JS code, and that code would be loaded at runtime instead, then you could make changes to your guest code as you see fit without ever having to recompile the wasm, or even restart wasmtime at all. StarlingMonkey has such a mode, but ComponentizeJS for now does not, unfortunately.
That is way way way longer than it should be. Would you mind filing a bug about it over on Wasmtime? The more info, the better, with the ideal of course being full steps to reproduce and/or a wasm file to test with.
Mea culpa. My wasmtime was not built in release mode. In release mode, compiling the 13MB WASM bundle takes about 4.1s. Which combined with the initial WASM build puts me just about sub 10s on a fairly strong machine.
I guess it's not unreasonable to expect the builds to take multiple seconds when compiling a largish JS interpreter, including it's own dangling JIT and WASM engine.
Some improvement would definitely be greatly appreciated, such as your dynamically loading idea and/or stripping unused parts of the interpreter.
Just for reference, compiling my custom QuickJS built (which arguably is apples to oranges) is 1.9Mb and takes 650ms to compile, which both for size and time is roughly a 1:7 ratio.
It does, as well as a baseline compiler. But that's not even what I mean: if ComponentizeJS had a way to create a wasm component that contains the generated WIT bindings, but not any guest JS code, and that code would be loaded at runtime instead, then you could make changes to your guest code as you see fit without ever having to recompile the wasm, or even restart wasmtime at all. StarlingMonkey has such a mode, but ComponentizeJS for now does not, unfortunately.
That makes a lot of sense. That would be fantastic. This may even be more generally useful 🤷♀️ . Unless you're using weval, there wouldn't be much a runtime performance downside. I'm not too familiar with the WASM binary format or link flow, but if one could bundle a pre-built/cached engine built with the script, this might also speedup the initial build step (...sorry, just throwing ideas at the wall hoping something makes sense 😅 )
Really appreciate your help and input 🙏
That being said, we certainly should do way better on this—and we can: StarlingMonkey itself already has a runtime-loading mode where you create a component that loads JS at instantiation time, instead of it being baked into a snapshot. That's something we can enable for ComponentizeJS as well, though it's not entirely trivial.
Not wanting to be pushy but merely curious, were there any discussions on your end? I'm mostly trying to follow up, because I received some feedback on the compile-times myself and your suggestion would greatly alleviate all issues 🙏