wasmtime Support for stack switching proposal

This PR adds support for the Wasm stack switching proposal. The explainer document for the proposal is here.

This means that the following instructions are added: cont.new, resume, suspend, switch, and cont.bind. The instruction resume.throw, which is part of the proposal but relies on exception handling, is currently unsupported.

This was developed together with @dhil.

General implementation notes

In Wasm, continuations are represented by values of type (ref $ct), where $ct is a new composite type/heap type for continuations. In the implementation, these are represented by values of type VMContObj. These are fat pointers, consisting of a sequence counter, and a pointer to a VMContRef. The latter type is used for the actual representation of continuations. The sequence counter part of VMContObjs is used to check that every continuation value can only be used once.

The StoreOpaque is extended to contain a "stack chain": It indicates what stack we are currently executing on. Logically, this is a linked list of stacks, since each continuation has a parent field. The chain stored in the StoreOpaque always ends with a value representing the initial stack. This is the stack we were on when entering Wasm, which will usually be the main stack.

Memory Management

Currently, memory management is very basic: The StoreOpaque provides a new method for allocation a new VMContRef, and keeps track of all continuations created this way. Continuations are never deallocated at the moment, meaning that they live until the store itself is deallocated.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Backtrace generation

The existing backtrace generation mechanism is extended to be fully aware of continuations: When generating backtraces, the entire chain of continuation is traversed, not just the stack frames of the currently active stack/continuation.

Integration with GC

Integration with the GC proposal is limited, but should at least be safe:

Programs where continuation objects would be stored inside GC objects (structs, arrays, ...) are rejected
During GC, all stack frames of all continuations are inspected when looking for GC roots.

Entering/Exiting Wasm

Prior to this PR, there were two separate mechanisms that save and restore some state of the runtime when entering/exiting Wasm:

The functions enter_wasm and exit_wasm in func.rs
CallThreadState saves and restores (on drop) parts of the VMRuntimeLimits

This PR consolidates these two mechanism, because it requires some additional state to be updated and restored on enter/exit: the type wasmtime::runtime::func::RuntimeEntryState now stores all of the required runtime state and ensures that it's restored when exiting Wasm.

Limitations

The limitations are as follows, some of which were mentioned above:

resume.throw needs to be implemented once exception handling is there
Full integration with GC and support for deallocating continuations
The only supported platform at the moment are unix-like x64 systems. I'm planning to change that soon as follow-up PRs, this won't require big changes to what's in this PR at the moment.
Neither Winch or Pulley are supported at the moment.

Feb 03 '25 18:02 frank-emrich

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "wasmtime:c-api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

fitzgen: wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Feb 03 '25 20:02 github-actions[bot]

There are currently a few test failures that I 'm hoping to get some help with:

I don't understand cargo vet well enough to know what's up, but the problem should be fairly benign: We are using memoffset in a different crate now, but from what I can tell, it has already been vetted for use in this project.
There are some build failures due to the new instructions not being supported on all platforms. In general, we've added a Wasm feature and a cargo feature for stack switching. My plan was that all stack switching tests would be skipped if the stack-switching cargo feature is not enabled. That's easy to do for the all test suite, but I'm not sure what the best approach is for the wast tests to achieve this. Am I understanding correctly that the wast test suite always currently runs all tests, independently from what cargo features are enabled?

As a general note, so far most of the stack switching code is compiled even if the cargo feature is disabled. It would be possible to get more aggressive with conditional compilation, but this requires some ugly special casing in backtrace.rs. Happy for suggestions if people would like to see more conditional compilation or are happy with keeping it minimal.

Also, I had to re-bless most of the disas tests since the layout of the VMContext changed due to the addition of some new fields. That has increased the surface area of this PR even more.

Feb 04 '25 15:02 frank-emrich

Am I understanding correctly that the wast test suite always currently runs all tests, independently from what cargo features are enabled?

That is correct, and we assert that tests that "should" fail do in fact fail. This way, if we fix bugs or expand support, we won't accidentally and silently miss out on tests for those things.

This should be fine for your stack switching tests, as long as you

mark the tests as "should fail" on supported platforms, and
return an error or some sort (rather than unsafely segfault or whatever) when processing Wasm that uses this proposal on an unsupported platform/configuration.

See https://github.com/bytecodealliance/wasmtime/blob/d7d605c236a857a4019aa39850e9dd6a6597ece1/crates/wast-util/src/lib.rs#L280

Feb 05 '25 16:02 fitzgen

Oh sorry I should have left a note here but I think I forgot. I was out last week for a company offsite and I'm out this week for the CG meeting, but I plan on getting to this next week. I want to make sure I've got plenty of time to sit down and go through this, but wanted to give a heads-up that I'll be looking into it next week at the earliest.

Also to reiterate: thank you so much for this! Thanks for wrestling with CI as well, much appreciated!

Feb 12 '25 18:02 alexcrichton

@alexcrichton

Thanks so much for having a look at this already!

Regarding my own time commitment for this PR: I'm starting a new job in mid March, so things are looking as follows:

Until end of February: Focusing exclusively on getting this PR in shape
First two weeks of March: Working on this PR, but some distractions due to moving, etc.
From mid March: There will be an initial period of me settling in at my new job during which I can't do much on this. After that the Wasmtime/stack switching work will not part of my day job anymore, but I'll still be around to help out, address bugs, etc. I also have a few things on top of the current PR already done or planned that I want to add at that stage (aarch64 support, Windows support, better DWARF support, ....).

@dhil should also be able to help out here and there.

In an ideal world this PR would be split up into chunks and landed piecemeal, for example parsing support, then compiling support, then maybe an instruction-at-a-time for each piece, etc.

I think there was just a bit of a miscommunication between me and @dhil. I thought you wanted to see this as one big chunk. There's a simple way in which this PR could be split up, leading to a sequence of PRs as follows:

Integrate tag support, with nothing stack switching related, yet.
Add dummy field to VMContext with the size and position that the stack switching field will later occupy. This way, all the changes to the disas tests will happen in this PR. I'm not sure if the disas tests actually contribute to Github choking on this PR, but at least getting them out of the way will reduce the amount of conflicts coming in.
Everything except the actual Wasm -> CLIF translation. This will be a big chunk where most of the integration of stack switching with the rest of Wasmtime goes.
The Wasm -> CLIF translation. This will mostly be what's currently in crates/cranelift/src/stack_switching/, in particular instructions.rs. The latter is 2500 LOC, so reviewing that separately seems worthwhile. I may even be able to split this up further.

This is far from perfect because PR 3 and 4 will still be large and not really split along separate features of the stack switching implementation, but at least that split up would be easy for me to do. I may discover some more chunks that can be factored out into separate PRs on the way.

My only concerns about this is the following: Given that Github doesn't really support stacked PRs, we would work through these 4+ PRs linearly, making it difficult to point at why we need something in the next PR.
On the other hand, I could just put a branch somewhere that shows everything that's still missing at any point so you can have a look at how something will be used in the remaining PRs.

Please let me know what you think. I will also make sure to work through the comments you already added to this PR so they don't get lost if we do decide to split this PR.

Edit: One concern I forgot to mention regarding splitting this PR up is testing: We only have a few scattered unit tests, most of our testing happens through whole Wasm programs. That means that if we split the PR, we will only be able to add tests in the end, when all pieces are in place.

Feb 17 '25 13:02 frank-emrich

Regarding my own time commitment for this PR

Ok that all sounds good, thanks! (very helpful to set expectations!)

I think there was just a bit of a miscommunication between me and @dhil.

Oh no that's on me. I remember telling @dhil that and I think I was just basically wrong, so that's on me for leading y'all astray. Given the size of this (not just the auto-updated tests but also the scale of the implementation) along with the subtelty of the implementation in retrospect I just didn't account for that and misled y'all. If possible I think the best way to handle this would be to peel off PRs from this one and submit them independently.

Overall my goal is to ensure that we get a good chance to review everything coming in and reducing the chance of something being lost by accident due to being overlooked. For coordination what I would imagine is that it's completely reasonable to have lightly, if at all, tested intermediate states. Additionally it's totally reasonable to leave a "hole" in the code with something like // more coming soon ... (e.g. making a hole in the VMContext for the stack-switching state necessary for a future PR). Let's keep this PR open so it can be cross-referenced if needed, and this can be rebased occasionally as smaller pieces are landed.

Does that seem ok? It's a similar strategy we're taking for landing async component model support in Wasmtime where it's landing piece-by-piece. Each piece is not 100% thoroughly tested but the "final PR" has lots of tests. We try to keep track of outstanding TODO items in issues occasionally and otherwise are ok deferring work to a future PR which has more tests and more fully justifies a prior design.

Your sequence of PRs makes sense to me as well. I'd mostly defer to y'all for pieces to split up as you're the most familiar with the implementation, and if something seems too big I can bring it up during the review but by naturally splitting this up I think most slicing/dicing will be quite reasonable.

Feb 18 '25 17:02 alexcrichton

Overall my goal is to ensure that we get a good chance to review everything coming in and reducing the chance of something being lost by accident due to being overlooked.

Yes, absolutely. I'm happy to do the necessary chopping. I'll create a tracking issue describing the bigger picture and where we currently are regarding the PRs that have and haven't landed at any time.

The outline of PRs I mentioned in my previous comment will still lead to fairly large ones, but I will see if there's potential to split this up further and wait for your feedback once I've split off a candidate for a PR.

Also, sorry for making you wade through lots of todo!(), unimplemented!() and other little things that we hadn't resolved before creating this PR.

Feb 18 '25 21:02 frank-emrich

Sounds good! It's ok to skip the step of writing out a plan in an issue and then executing on the plan, we can work on making sure things are documented before you wind down but up until then it's ok if it's just in a few heads. In that sense no need to exhaustively document "here's the round peg that's gonna fit in that round hole" and we can just have one issue for tracking known items that are already in-tree.

And please no apologies necessary! Landing big changes I've found is always a bit of an art and if anything I'm the one who led you astray by being overconfident!

Feb 18 '25 23:02 alexcrichton

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Feb 19 '25 21:02 github-actions[bot]

Converting this back to a draft now because it will be split up into multiple PRs.

Over the next few weeks, I expect to rebase this PR whenever parts of it have landed. This way, this PR will always contain whatever is left.

I've also created issue #10248 to track the overall status. I will also use that issue in various FIXMEs for missing features.

Feb 19 '25 23:02 frank-emrich

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to complete this check list:

[ ] If you added a new Config method, you wrote extensive documentation for it.

Our documentation should be of the following form:

Short, simple summary sentence.

More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.

Is this method fallible? If so, when can it return an error?

Can this method panic? If so, when does it panic?

# Example

Optional example here.

[ ] If you added a new Config method, or modified an existing one, you ensured that this configuration is exercised by the fuzz targets.

For example, if you expose a new strategy for allocating the next instance slot inside the pooling allocator, you should ensure that at least one of our fuzz targets exercises that new strategy.

Often, all that is required of you is to ensure that there is a knob for this configuration option in wasmtime_fuzzing::Config (or one of its nested structs).

Rarely, this may require authoring a new fuzz target to specifically test this configuration. See our docs on fuzzing for more details.
[ ] If you are enabling a configuration option by default, make sure that it has been fuzzed for at least two weeks before turning it on by default.

To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the .github/label-messager.json configuration file.

Learn more.

Feb 21 '25 02:02 github-actions[bot]

Support for stack switching proposal

General implementation notes

Memory Management

Backtrace generation

Integration with GC

Entering/Exiting Wasm

Tags

Limitations

Subscribe to Label Action

Subscribe to Label Action

Label Messager: wasmtime:config