wasm-tools icon indicating copy to clipboard operation
wasm-tools copied to clipboard

Extend `wasm-tools component new` to support preexisting wasip1 modules without cabi_realloc

Open chenyan2002 opened this issue 5 months ago • 10 comments

Currently wasm-tools component new uses a special adapter to convert WASI p1 to p2. The adapter only works if it can take a dedicated space from the main module's memory, either via cabi_realloc or memory.grow. For managed languages, such as Go or Python, the language runtime usually doesn't export cabi_realloc. A direct memory.grow without properly notifying the GC also causes memory corruption.

To resolve this problem in a language agnostic fashion, there are two options:

  1. In addition to importing the main module's memory (memory 0), the adapter can define a local memory (memory 1) to store its own state. Then the adapter code needs to know when to access memory 0 or memory 1. This can be achieved by annotating the Rust code block, for example,
if iovs_len == 0 {
  user_mem!({*nwritten = 0;});
  return ERRNO_SUCCESS;
}

The user_mem! macro inserts special wasm instructions at the beginning and the end of the code block, for example, push a magic number and immediately drops it. We then build a wasm rewriting tool to take the adapter module, and redirect all memory access to memory 0 in the user_mem! region, and access to memory 1 for the rest of the code.

  1. To avoid multi-memory, we can rewrite the main module to shift all memory addresses by 2 pages. In this way, the first two pages of the memory are never touched by the main module. The adapter can safely write its state and stack to the first two pages without corrupting the user memory. The adapter code still needs to know which pointers come from the main module and which pointers come from the adapter. For the main module pointers, we need to add the offset in order to access the correct location. If the adapter returns any pointers back to the main module, we need to keep the lie by subtracting the offset from the actual address. These can be achieved using macro similar to option 1, for example, *user_ptr!(nwritten)=0.

I’ve successfully implemented the address shifting approach in https://github.com/fastly/Viceroy/pull/515, which requires no code change from wit-component. Except that many code from gc.rs can now be removed if we use this approach. The multi-memory approach is conceptually simpler, but requires more changes in wit-component. I got a table index out of bound error when adapting an empty Rust program, and didn’t dig further.

The goal of this issue is to figure out which approach we prefer to be the official fix.

  • The multi-memory approach doesn’t need to rewrite the user module, and we no longer have the restriction of no memory allocation or table in the adapter code. The cons is that it requires more changes in the wit-component crate. Having an extra memory can increase serving cost in some cases.

  • The address shifting approach doesn’t require code changes in wit-component, and doesn’t need multi-memory support. But it needs to rewrite the user module. Not sure how the shifted address may affect the debugging tools.

Another side note is that, in https://github.com/fastly/Viceroy/pull/515, I use walrus for wasm rewriting. I understand it’s not always in sync with the latest wasm_parser, so bringing that dependency in the wasm-tools repo can be a concern. On the other hand, I do like the id arena feature in walrus, where we abstract away all the wasm indices from the end user. This makes wasm transformations extremely easy to write and less error-prone than wasm-encoder. Curious to know people’s thoughts on using walrus in the repo or having our own version of id arena. I understand we will lose the parser/encoder roundtrip once we use the arena.

chenyan2002 avatar Aug 26 '25 01:08 chenyan2002

For (1) we opted to avoid doing that due to the runtime cost of adding a linear memory and this would be forced on all components for quite some time. There's also a tooling issue as you point out of actually writing this module is not possible without investing more in tooling, so it's not a clear "lift and shift" to switch to a strategy like that.

For (2) I may not be understanding correctly. If you are shifting data segments in the main module and growing the memory by 2 that won't work. The adapter module shouldn't have anything to shift, though, so I'm assuming you mean the main module? The problem with that would be that linear memory addresses are baked in data segments you aren't shifting.

I thought @erikrose's idea of "use LLD linker symbols to grow the initial heap upwards by 2 pages" was reasonable. Would something like that work?

I understand we will lose the parser/encoder roundtrip once we use the arena.

One other major loss is dwarf debug information which is why we've always historically avoided modifying the main module. It's too useful to have dwarf work so we keep it working and avoid modifying the code section.

alexcrichton avatar Aug 26 '25 14:08 alexcrichton

The adapter module shouldn't have anything to shift, though, so I'm assuming you mean the main module? The problem with that would be that linear memory addresses are baked in data segments you aren't shifting.

Yes, we are only instrumenting the main module. For every memory instruction, we add a 2 page offset to the address before we call the memory instruction. The addresses in the data section is okay, as we only add the offset on the fly when accessing the memory. For import functions, this means that the pointers we pass to the import function is still the original address without the offset. That's why in the adapter code, we need to add the offset back before deref.

As you point out, this does mean that we are modifying the main module. The instrumentation is all local to the code block, maybe there is a way to preserve the dwarf info?

chenyan2002 avatar Aug 26 '25 16:08 chenyan2002

That strategy unfortunately wouldn't work if the return value of memory.grow is used to calculate an address (for example this). That would mean that the module is silently updated to access memory two pages beyond the extent of the growth, meaning that if memory is grown by one page an access would trap. Effectively you'd probably also have to update all memory.grow instructions as well.

I'm not aware of a way to reliably keep DWARF working. There are various levels of support in various levels of tools for mapping DWARF, but it rarely captures the full fidelity of the original DWARF.

Is there a reason that the use-the-linker-symbols approach isn't viable?

alexcrichton avatar Aug 26 '25 16:08 alexcrichton

I thought @erikrose's idea of "use LLD linker symbols to grow the initial heap upwards by 2 pages" was reasonable.

For context, I believe Alex is referring to my idea to increment the __heap_base linker symbol. TinyGo pays attention to this symbol, but I don't think Big Go does. "heap_base" does not occur in its codebase. What's more, Big Go implements its own linker, and __heap_base is only a wasm-ld convention. It's still conceivable that Go exposes the heap base somewhere statically determinable (and writeable), but we'd need different logic to extract it.

The other thing that could go wrong with this approach is that adapters currently assume they can allocate arbitrary amounts of RAM at runtime. We would be necessarily putting an ahead-of-time limit on that. However, this should suffice for the p1-to-p2 adapter (whose allocations are fixed-size), and discussions have made me somewhat comfortable that we don't expect future adapters, if any, to be different.

erikrose avatar Aug 26 '25 16:08 erikrose

Effectively you'd probably also have to update all memory.grow instructions as well.

Yes, we subtract 2 pages from memory.grow and memory.size. We also add 2 pages to the initial memory size. Beside these two, We instrument all these instructions: memory.init, memory.fill, memory.copy, load and store.

chenyan2002 avatar Aug 26 '25 17:08 chenyan2002

Ah ok makes sense.

Personally I'd still recommend pushing on upstream languages to make this easier in the tooling. For example Go could export a __heap_base or something similar. Either that or the allocator could be taught to work with external calls to memory.grow. Either that or support a cabi_realloc which is required for the component model anyway. I realize that none of this is possible for preexisting modules, but the phrasing of "support managed languages" in the issue title here I don't think is quite accurate if the main concern is only preexisting modules (and otherwise it's not clear to me why languages can't be updated to support memory.grow or cabi_realloc)

I'd prefer to not bake in wasm transformations into wasm-tools that break DWARF information since that's one of the primary goals of wasm-tools component new is to preserve it. It seems reasonable to leave the transformation on the user side of things though and wasm-tools component new could be invoked as "place the adapter at this static address" or something like that and those bits would live here.

alexcrichton avatar Aug 26 '25 17:08 alexcrichton

The other place I'd push on is to not depend on an adapter, and instead implement wasip2 directly in your languages and their libraries. The adapter was intended just for bootstrapping the ecosystem, and lived longer than any of us intended. We are currently funding an effort to get rid of wasip1 in the wasip2 targets of wasi-libc, in order to no longer require the adapter in wasi-sdk C/C++, as well as in Rust std.

pchickey avatar Aug 26 '25 17:08 pchickey

if the main concern is only preexisting modules

Yes, it's only for preexisting WASI p1 modules. If the upstream language already supports component model, we wouldn't need to run wasm-tools component new. I think component new is only for preexisting wasip1 modules?

It seems reasonable to leave the transformation on the user side of things.

Make sense. With the current transformation, we only need to provide our own version of adapter module when calling component new. No code change is needed here. I can look into the possibility of preserving the DWARF info. The transformation we performed is very local and static, there is some hope this can work...

not depend on an adapter, and instead implement wasip2 directly in your languages and their libraries.

Totally agree. This is mainly for preexisting wasip1 modules.

chenyan2002 avatar Aug 26 '25 17:08 chenyan2002

There is also the option of trying to get the language toolchain to work and provide a usable cabi_realloc. That's not something that fundamentally can't be done for managed language runtimes—see StarlingMonkey and ComponentizePy.

I also created a proof of concept for this for (Big)Go, here.

tschneidereit avatar Aug 26 '25 17:08 tschneidereit

I think the problem is that when cabi_realloc isn't present, component new falls back to call memory.grow directly. This fallback can be problematic for managed languages.

Our use case is that we want to migrate all existing wasip1 modules into component model. We may not have access to the original code, only the wasm module. Extending the upstream language runtime would not work in this case.

chenyan2002 avatar Aug 26 '25 18:08 chenyan2002