Repeated calls to an instance start failing
We're testing long-lived component instances created with ComponentizeJS. After some number of calls to an instance we start getting failures, wasm traps such as unreachable or uninitialized value. The number of calls before a failure seems to vary based on the functions called and the data returned, but it consistently fails given the same pattern of calls.
I've created a test case using the ComponentizeJS testing here: https://github.com/bytecodealliance/ComponentizeJS/compare/main...pvlugter:ComponentizeJS:repeated-calls
Which fails after 1805 calls with:
RuntimeError: failed on attempt [1806]: null function or function signature mismatch
at js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (wasm://wasm/029a3d3a:wasm-function[337]:0x193642)
at js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) (wasm://wasm/029a3d3a:wasm-function[6458]:0x58cc39)
at JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) (wasm://wasm/029a3d3a:wasm-function[1448]:0x31b2b4)
at call(unsigned int, void*) (wasm://wasm/029a3d3a:wasm-function[500]:0x1e76dc)
at exports#hello (wasm://wasm/029a3d3a:wasm-function[14509]:0x67e160)
at Object.hello (file:///.../ComponentizeJS/test/output/repeated-calls/repeated-calls.js:1430:40)
at Module.test (file:///.../ComponentizeJS/test/cases/repeated-calls/test.js:6:36)
at Context.<anonymous> (file:///.../ComponentizeJS/test/test.js:138:18)
Thanks for sharing a replication here, this seems like an overflow case. Will aim to look into it further soon.
With the upgrade to the new StarlingMonkey engine, this test is no longer failing. I've still merged it in to ensure there are no future regressions though - https://github.com/bytecodealliance/ComponentizeJS/commit/a9e071dab4168f5f1458d8dc71f41fe5f3c30bc8.
Thanks. Tried a local version of this on our larger tests and it does still fail. Trying the simple repeated-calls test here again: it also eventually fails, just many more calls required:
RuntimeError: failed on attempt [172515]: unreachable
at wasm://wasm/030be13a:wasm-function[8641]:0x73f1a3
at wasm://wasm/030be13a:wasm-function[5761]:0x6c51cd
at wasm://wasm/030be13a:wasm-function[622]:0x334e66
at wasm://wasm/030be13a:wasm-function[4755]:0x67eb46
at wasm://wasm/030be13a:wasm-function[7539]:0x71c98d
at wasm://wasm/030be13a:wasm-function[688]:0x35f6d0
at wasm://wasm/030be13a:wasm-function[1599]:0x4bdb90
at wasm://wasm/030be13a:wasm-function[212]:0xc4ea0
at wasm://wasm/030be13a:wasm-function[623]:0x335d87
at wasm://wasm/030be13a:wasm-function[622]:0x334cc1
at wasm://wasm/030be13a:wasm-function[4755]:0x67eb46
at wasm://wasm/030be13a:wasm-function[1729]:0x4de08d
at wasm://wasm/030be13a:wasm-function[868]:0x3c05cc
at exports#hello (wasm://wasm/030be13a:wasm-function[12402]:0x775082)
at Object.hello (file:///.../ComponentizeJS/test/output/repeated-calls/repeated-calls.js:17544:40)
at Module.test (file:///.../ComponentizeJS/test/cases/repeated-calls/test.js:6:34)
at Context.<anonymous> (file:///.../ComponentizeJS/test/test.js:182:18)
In our use case, with more complex data structures, it's < 1000 calls. We're running stateless components, so our current workaround is to run the instance until failure and then recreate it, to get reasonable performance.
Are you sure you're building the local version correctly? I'm publishing a release shortly, perhaps test on that? I did try your repeated-calls test case with 20,000 calls and it still works fine.
I think the local version is correctly built. Ran repeated-calls with 200,000. Didn't fail until 172,515.
I'll try our own tests with the published release once available.
Tested 0.8.0 with one of our own tests. Fails after 1164 calls. For 0.7.1 it was just 92 calls before failure. With the local version I tried, on ebeb262, it was 973 calls.
Thanks for the report, at least the numbers are getting bigger. Reopening.
One first step here might be to try and debug if this is a GC issue or a bigger allocation issue in ComponentizeJS. Another useful isolation could be to independently test StarlingMonkey with some GC objects in an exported interface to see if it's definitely happening on the ComponentizeJS side.
Hi, we're seeing something similar in a plugin system we are developing. The host is in Rust and uses wasmtime v23.0.2.
The host repeatedly call a function from the wasm component to apply a transformation to a stream of byte chunks. It works for a few iterations but crashes after a while.
The error occurs reliably after the same number of calls for a particular input, but that number changes with different inputs.
Our setup has been working fine with other languages for the guest plugin (go, python, rust) so we think we might be facing the issue described here.
The backtrace doesn't seem particularly helpful but I'll provide it anyway:
0: 0x772dbc - <unknown>!<wasm function 8792>
1: 0x714957 - <unknown>!<wasm function 6347>
2: 0x350f50 - <unknown>!<wasm function 450>
3: 0x6abb33 - <unknown>!<wasm function 4725>
4: 0x74cfe9 - <unknown>!<wasm function 7590>
5: 0x36a6df - <unknown>!<wasm function 490>
6: 0x4dc0d1 - <unknown>!<wasm function 1456>
7: 0xa86f3 - <unknown>!<wasm function 8>
8: 0x34109d - <unknown>!<wasm function 426>
9: 0x350e15 - <unknown>!<wasm function 450>
10: 0x6abb33 - <unknown>!<wasm function 4725>
11: 0x518f19 - <unknown>!<wasm function 1706>
12: 0x3c2a0e - <unknown>!<wasm function 652>
13: 0x764085 - <unknown>!transform
I think I've hit the same issue as Guy noted with the issue mention. In case it helps debug, here is a repo with the code I'm componentizing: https://github.com/stadiamaps/pelias-parser-component.
Just npm install and npm run build to get a WASM component. My test setup was to instantiate the component and then call the exported function in a criterion bench. After some number of iterations which varied based on input and the specific build (ex: making minor changes that didn't really affect the end result), it hits an unreachable instruction. Given any binary (component) + input pair, the number of iterations to fail seems to be fixed.
Hey going to try to find time to look into this within the next couple weeks here -- certainly is a head scratcher.
Will see if I can make a smaller/simpler reproduction since it seems to just be performing calls repeatedly that's triggering here.