wit-bindgen Reimplement host generators in terms of components

I'm opening this issue to document my thinking as a result of trying to resolve issues like https://github.com/bytecodealliance/wit-bindgen/issues/214, https://github.com/bytecodealliance/wit-bindgen/issues/34, and https://github.com/bytecodealliance/wit-bindgen/issues/201. I mentioned a number of issues with the current wit-bindgen architecture in #214, the biggest of which is that there was no clear location to slot a true "component" into the current hosts due to their centralized concept of using a singular wasm module for everything. To that end the best summary I can come up with is that the host-generators of wit-bindgen (gen-host-wasmtime-{py,rust} and gen-host-js) need to be reimplemented in terms of components, not core wasm modules.

In pivoting to components instead of core wasm modules I also believe that there's going to be to classifications of hosts: one where the host natively supports components and one where the host does not. For example gen-host-wasmtime-rust natively supports components, but gen-host-wasmtime-py (which uses Wasmtime's C API) and gen-host-js do not. This is not just a surface level distinction but will have deep implications on the capability of the code generators, namely hosts which do not have native component support will only support loading a single statically determined component. Hosts which natively support components (I'll just say Wasmtime from now on to refer to the Rust embedding of Wasmtime) will be able to consume any component that has a particular "shape", e.g. the same set of imports/exports with arbitrary structure internally.

Hosts without component support

I mentioned above but the major change for these hosts is that unlike today where they attempt to support any "shape" of core wasm module these hosts will only be able to generate bindings for a single statically known component. The rationale for this is that the imports/exports of a component, which are the most interesting from an embedder perspective, are only a small portion of the structure of a component which can have a lot going on internally within it. I don't think that wit-bindgen wants to get into the business of generating an entire component runtime for JS, for example, so instead these host generators will ingest a component and spit out bindings for just that one component.

At least for JS this lines up with how it was envisioned being used. In JS or web scenarios the intent is that there's a "thing compiled to wasm" that you want to run, so it's a statically known single module and there's no need for it to be able to instantiate multiple different shapes of modules at runtime. The Python generator for a Wasmtime host is mostly just proof-of-concept right now and I at least myself don't have preconcieved notions about how Python is most likely to be used.

The refactoring for the JS host generator (and transtiviely Python) will be that instead of taking --import and --export for *.wit files the JS host generator would instead "simply" take a component as input. At a high level I view the bindings generation process as:

A *.wasm file, which is a component, is provided as input
The wit-bindgen-gen-host-js crate uses the wasmtime-environ crate, an internal implementation of Wasmtime itself, to "decompile" the component
The output is one of these. The JS host generator would then have the goal of translating that to a JS object with bindings.
Using Component::imports the imports argument to the JS object are generated (probably mirroring core wasm imports as nested maps or something like that)
Next Component:initializers is iterated over which goes through the process of instantiating a component. For example ExtractRealloc would do something like this.reallocN = previous_wasm_instance.exports['the-realloc-name'];
The LowerImport variant is where lowering logic is where a JS function is generated to pass to the import of a component. This JS function will do what wit-bindgen-gen-host-js does today taking wasm primitives as arguments and then calling the appropriate JS-imported function, translating the results back to wasm.
Finally the Component::exports array is iterated over to create the exported functions on the JS object. These similarly do what wit-bindgen-gen-host-js does today.

Overall the contents of wit-bindgen-gen-host-js will largely be kept but there will be a number of internal refactorings to ensure that all the right wires can be connected to the right place. For example loads/stores will grow a paramter of which memory they're referencing and things like that. By taking an actual component as input this will also provide a path forward to implementing features like non-utf-8 strings, memory64 modules, etc. This'll probably not be implemented for now but will eventually could be implemented in the future as the need arises.

An example is that for this component:

(component
  (import "host-log" (func (result string)))
  ;; ..

  (export "guest-log" (func ...))
)

would generate JS of the form:

class TheComponent { // configurable name from the CLI probably based on the name of the `*.wasm` or something
    async function instantiate(loadWasm: function(string): Promise[WebAssembly.Module], imports: TheComponentImports): TheComponent {
        // lowerings, initializers, etc
    }

    function guest_log(arg: string) {
        // uses `this` to lower `arg` and lift results
    }
}

interface TheComponentImports {
    host_log: function(): string,
}

(please excuse my JS pseudo-code)

Here the loadWasm will be used to asynchronously load the core wasm blobs that wit-bindgen would spit out (in addition to the JS file here). The imports is the imports object generated from the input component. Otherwise everything is entirely internal within the component translation and is part of the bindings.

Open questions are:

The binary format currently has no affordances for the names of types, so it's not clear how human-readable names for type bindings will be generated
Components-calling-components via adapters should "work" except for the fact that this requires multi-memory-in-core-wasm and I'm not sure if any JS runtimes implement that yet.
Documentation as currently lives in *.wit files would not work any more since there's no location in the binary format for that to live today

Hosts with component support (Wasmtime)

The story above for JS and Python-on-the-host will be radically different for Wasmtime-on-the-host since Wasmtime has native support for components. All of the nitty-gritty of lifting and lowering is handled by the wasmtime crate and derived trait implementations on types. This means that wit-bindgen-gen-host-wasmtime-rust actually does "just" a fairly small amount of work and can be more general with the input it ingest than the JS bindings.

The inputs to the Wasmtime generator today are --import and --export files but I believe this should be removed in favor of *.world files. An input *.world file would then have the Wasmtime generator generate associated submodules/traits for all the necessary components. This would be roughly the same shape of today's code generator but the differences I think will be:

There should be one types module which is a "soup" of all types mentioned everywhere in the *.world file. This would be how Wasmtime would translate from the structural typing of the component model to the nominal typing of Rust where.
A Rust module would be generated for each interface in the *.world files, and the types used in the interface would be use'd, possibly renamed, from the types module generated prior.
Imported interfaces would turn into a trait definition.
Exported interfaces would probably all be union'd onto one output generated structure. (details TBD)
One add_to_linker function would be generated which would project from the T of the Store into &mut U where U: ImportedTrait for all the imported interfaces. This would then register all the appropriate names in wasmtime::component::Linker with the appropriate types. Note that no lifting/lowering happens here, that's all handled by wasmtime.
The exported structure would have a method like new_from_instance or similar (same as what's there today) which would then extract all the exports, type-check them, and store TypedFunc references to all the functions.

I think the general structure of Wasmtime's generator won't change much relative to the changes needed for the JS geneator. Largely code is just going to get removed and the input to the generator will change to be a *.world instead of a list of exports/imports. Overall personally I feel like more forcing functions are needed to guide the precise design of the generated code here. Most of this is just me shooting in the dark trying to figure out something that's reasonable, but having more concrete use cases where the above doesn't work would help guide tweaks and refinements to improve the generated interfaces.

Should this all still live in one repository?

I think this is a reasonable question to ask with the above changes. For example the Wasmtime code generator is using *.wit parsing but the JS generator isn't. The JS generator is dealing with lifting/lowering and the Wasmtime code generator isn't. Similarly guests are also pretty different where they're doing lifting/lowering but in the context of their own wasm module as opposed to a guest being instantiated.

In my opinion, though, there's enough shared that I'm still of the opinion that this should all live in the same place. The type hierarchy representation is shared amongst all these use cases for one. Additionally the lifting/lowering details are shared between the guests and JS generator. The *.wit parsing is shared between guests and Wasmtime. While it's not quite "everything shares everything" as-is today I personally feel there's enough overlap for this all to live in the same repository to develop within.

What next?

First and primarily these changes I think need to be agreed upon. These are massive changes for any non-Wasmtime generator, namely taking components as input rather than *.wit files. Even for Wasmtime things are going to change a lot because the core wasm abstraction layer will be going away and instead Wasmtime's component model support will be used. All that's to say that this requires a lot of deep architectural changes for both users of wit-bindgen and wit-bindgen itself, so agreement should be established first.

Even with agreement on a path forward I don't think there's a great story on how to realize all the changes I describe above. The best idea I have personally is to:

Implement an independent tool that goes from core wasm modules to components using the canonical ABI name mangling.
Use "one giant PR" to atomically move over everything in wit-bindgen to the new architecture.

That "one giant PR" isn't really parallelizable at all and there can't really be any meaningful independent development while that PR is being written unfortunately.

Sep 14 '22 16:09 alexcrichton

Looks great! I have a quick question, you mention decompiling the component using wasmtime-environ, could this approach allow for statically linking components?

Sep 14 '22 19:09 willemneal

Perhaps? I don't think I know what you mean by statically linking components though. My current understanding/intention is that the input to the JS runtime would be a single component which internally might have other components within it but that single component wouldn't be able to import other components (similar to the current restrictions of the Wasmtime-based embedding). In that sense you could statically link components together by bundling them into one large component, but I'm not sure if this is what you are asking for.

Sep 14 '22 19:09 alexcrichton

In that sense you could statically link components together by bundling them into one large component, but I'm not sure if this is what you are asking for.

Yeah I'm wondering about where this tooling fits in.

Sep 14 '22 19:09 willemneal

The tooling for actually creating a statically linked component is somewhat orthogonal to wit-bindgen itself and host generators, they'll just need to work with whatever is given. I believe the wasm-tools compose subcommand, the wasm-compose crate, in the wasm-tools repository is the initial work towards creating a tool such as this, though (written by @peterhuene)

Sep 19 '22 14:09 alexcrichton

I have discovered what is at least a wrinkle and at most a showstopper for implementing this: WASI. The current wasi_snapshot_preview1 imports are not specified with interface types and are not compatible with interface types either. All existing targets that compile to wasm which wit-bindgen works with, however, use WASI targets. For example Rust today uses WASI, C uses WASI, and hypothetical JS, Python, Ruby, and Go targets all are expected to use WASI as well.

I decided to start on this today by doing the bare minimum, produce a component as part of the build process just to make sure it can be done for the tests in this repository. This cannot succeed, however, due to WASI imports. The only recourse at this time is to use a non-WASI target like wasm32-unknown-unknown. That has significant drawbacks, however:

Only Rust works with wasm32-unknown-unknown. While C theoretically works I am unaware of any standard toolchain which actually has support for this.
All other targets (JS, Python, Ruby, Go, ...) seem highly unlikely to work with "you can't import anything".
Even in Rust the support is extremely bare-bones, if an assert! trips or similar there's no way to get a message to the user since stdio, for example, doesn't work.

I'm currently debating with myself whether it's worth it to drop support for C, compile Rust with wasm32-unknown-unkonwn, and just eat the "this is almost impossible to debug" cost. On one hand it is the only way to make progress at this time. On the other hand it is clearly a subpar experience, by a significant amount. The best alternative that I can think of is to, by hand, a multi-memory-using module which adapts wasi_snapshot_preview1 to some custom wit_bindgen_tests_system_interface or something like that which can be specified with the component model. This would, for example, adapt fd_write on fd 1 to some print(x: string) function imported from the host. Such a core wasm module cannot be written in Rust, though, due to the use of multi-memory, so I don't know how to maintain that (and again it has no viability outside this repository).

Sep 22 '22 15:09 alexcrichton

The issue I raised about preview1 was discussed at today's wit-bindgen meeting and the conclusion was that we'll write a source-level translation which exports preview1-lookalike things and imports, via wit-bindgen generated stubs, "preview2" things. Currently "preview2" doesn't exist yet in a formalized state that wit-bindgen can import so it would be some adaptation.

The nuances would then be:

This shim module would perform translation from preview1 to preview2. It would need to be instrumented at componentization time in the following ways:
- The linear memory would be imported, not exported
- The stack for this module would be allocated at start with a memory.grow
- This module would import its function table, and the import would get removed.
- This module cannot use any data segments
- This module cannot use any elem segments
- This module would export cabi_realloc which would return a per-function-call return pointer. This is ideally fitting the use case where each preview1 return value will require at most one return value that needs a malloc
The shim module would be gc'd to be the precise size necessary for the preview1 imports being required
Support for inserting this shim module would get added to the wit-component tool (eventually wasm-componentize as it develops)

This should provide, as a general purpose shim, a way to migrate from wasi-preview1 to wasi-preview2 in the long term ideally. For now it should provide a reasonble means by which the wit-bindgen tests can be written and run.

Sep 23 '22 19:09 alexcrichton

At this point this issue is nearing completion. https://github.com/bytecodealliance/wit-bindgen/pull/355 has implemented this change for the Wasmtime host generator and https://github.com/bytecodealliance/wit-bindgen/pull/373 is the implementation for JS. The only remaining piece is the wasmtime-py host generator which should be pretty straightforward to simply copy what JS did.

Overall I'm personally feeling quite good about these changes. Everything seems to fit well together and this all feels like a solid technical foundation to continue building on. Namely the world addition to *.wit I feel will fit naturally within the new structure of all the generators and throughout wit-component as well.

Oct 13 '22 16:10 alexcrichton

This is now finished with the update to the wasmtime-py generator, so I'm going to close this.

Oct 25 '22 14:10 alexcrichton