wasm-tools Function references

This patch implements (most parts of) the frontend for the function references proposal (currently we have not implemented initialised locals). Moreover, the crates shrink, smith, and mutate have yet to be made compatible with the new features --- once the changeset has been approved we can adjust those crates.

Boardly, the implementation of function references follows the proposed spec closely, and as such, this patch introduces some important changes to both the type structure and the validation algorithm.

The structure of ValType is no longer a flat, instead, it contains a new constructor Ref(RefType) which is parameterised by RefType, which is a new struct type. This type is itself parameterised by a HeapType, which is a new enumeration type. In addition, ValType is extended with the constructor Bot to represent the special value bottom type. It is special in the sense that it can only appear during validation as there is not surface syntax for it (this is in accordance with the spec). Concretely, the ValType, RefType, and HeapType are now defined as follows.

pub enum ValType {
    /// The value type is i32.
    I32,
    /// The value type is i64.
    I64,
    /// The value type is f32.
    F32,
    /// The value type is f64.
    F64,
    /// The value type is v128.
    V128,
    /// The value type is a reference. Which type of reference is decided by
    /// RefType. This is a change in syntax from the function references proposal,
    /// which now provides FuncRef and ExternRef as sugar for the generic ref
    /// construct.
    Ref(RefType),
    /// Special bottom type.
    Bot,
}

pub struct RefType {
    /// Whether it's nullable
    pub nullable: bool,
    /// The relevant heap type
    pub heap_type: HeapType,
}

pub enum HeapType {
    /// It seems by example that u32s are directly used for arbitrary indexes,
    /// but maybe a higher-level structure like TypeRef is relevant here?
    Index(u32),
    /// From reference types
    Func,
    /// From reference types
    Extern,
    /// Special bottom heap type
    Bot,
}

The validation algorithm has been adapted to follow that of the pseudocode algorithm in the appendix of the spec. We have made a few administrative changes such as plumbing an instance of WasmModuleResources around, because whenever a reference type has heap type Index(i) we need to look up the defined type at index i. The interface of WasmModuleResources has been extended with the following new methods.

fn type_index_of_function(&self, func_idx: u32) -> Option<u32>;
fn matches(&self, t1: ValType, t2: ValType) -> bool;
fn check_value_type(
    &self,
    t: ValType,
    features: &WasmFeatures,
    offset: usize
) -> Result<(), BinaryReaderError>;

These methods are used during validation:

type_index_of_function is necessary to obtain the result type of RefFunc
matches implements the subtyping relation on types; it is used in both operand validation and in element validation. It may need to peek into context to retrieve function types.
check_value_type is also used by both operand and module validation.

The validation procedure has obviously been extended with the new instructions; the validation of a few previous instructions has been updated:

Select computes the least upper bound of the two operand types.
RefFunc has been changed to match its new definition.
TableCopy uses subtyping (matches relation) rather than (structural) equality in accordance with the updated specification.
RefIsNull has been adjusted to use the new pop_ref method (see below).

The validation helper methods have changed too.

fn pop_operand(
    &mut self,
    expected: Option<ValType>,
    resources: &impl WasmModuleResources,
) -> OperatorValidatorResult<ValType>

fn pop_ref(
    &mut self,
    resources: &impl WasmModuleResources,
) -> OperatorValidatorResult<RefType>

Note that the interface of pop_operand has changed: it no longer returns an Option<ValType>, instead it returns simply a ValType (wrapped in an OperatorValidatorResult as before); the new Bot type subsumes all previous instances of Unknown which were represented by None. This change makes the validation logic more uniform. The new method pop_ref pops a reference type off the operand stack; it errors if the stack is empty or there is a type mismatch.

Comments and feedback are most welcome --- we have tried to follow the structure of the code base to the best of our abilities, however, we are not completely sure we have put the shared logic (e.g. type_index_of_function, matches, and check_value_type) in the right places. Also, help with getting the crates shrink, smith, and mutate up to date would be much appreciated.

Additionally, we have another patch that makes wasmtime compile and pass the testsuite (without function references) with our modified wasm-tools (c.f. https://github.com/effect-handlers/wasmtime/pull/3).

Aug 05 '22 23:08 dhil

Thanks for this! This'll take some time to digest but I hope to get at least an initial review of this in the next day or two. In the meantime though it looks like there's some CI failures which may need fixing up. (looks like other crates in the repo which need updates)

Aug 08 '22 15:08 alexcrichton

As a heads up I just merged https://github.com/bytecodealliance/wasm-tools/pull/697 which is a large restructuring of the validator. I'm happy to help rebase this on top of that if y'all have issues with it.

Aug 10 '22 18:08 alexcrichton

Thanks @alexcrichton! Regarding your comment about #697, we would very much appreciate some help rebasing. I don't know what the best course of action is here: do we refactor our implementation first and then rebase, or vice versa?

Aug 16 '22 12:08 dhil

I think it's up to y'all whether you'd like to refactor first or rebase. I mostly just wouldn't recommend doing them in parallel. If you'd like I can try to do the rebase later today.

Aug 16 '22 14:08 alexcrichton

Ok here's a summary of the remaining outstanding issues with this PR that I'm aware of:

The representation of ValType uses some #[repr(packed)] tricks to be 4-bytes but this means that you can't have more than 65k function types in a module. Currently we allow up to one million types so this probably needs to be extended in the future. This notably affects the ref.func instruction currently where ref.func will not validate, even if the function-references proposal is disabled, if the type index is too large.
The performance of the validator has regressed ~15-20% locally in benchmarks. My guess is that this is largely due to the 4x size increase of ValType from one byte to four. I have looked and I don't know how to recover this performance easily. Some local benchmarking shows that wasmparser, as-is on main, is already ~2x slower than v8's validator (compared against node.js), so this isn't necessarily a showstopper.
It's possible to construct a module that is valid but takes ~infinite time to validate.
The HeapType::Bot variant is exposed in the public API despite all consumers basically only going to use unreachable! in that case. The current "bottom type" is encapsulated in the validator and this "heap bottom type" is not.

Personally I would consider the threshold for merging this PR to be along the lines of "if you're not using the function-references proposal you're not affected". In that sense (3) and (4) seem ok. They can always be fixed as a follow-up before turning on this proposal by default. I think (1) can be solved by having a conditional implementation of visit_ref_func depending on whether the function_references proposal is enabled or not. (I know I originally pushed for not having checks for "is function-references enabled" but the implementation limit is causing me to backtrack on that).

For (2) I don't really entirely know what to do about that. I'm not aware of any current embedding of wasmparser where validation performance is of the utmost importance. For example while we care about it in Wasmtime Cranelift compilation time always dwarfs the validation time by a huge amount. Along those lines, in a macro sense, the slowdown may be inevitable until someone puts in more elbow grease to optimize the validator. Our validator is certainly by no means the highest-of-performance already in that there's a lot of "we prefer simpler code" idioms throughout it.

Overall though I'd like to get a second opinion on these points and the state of the PR itself. @fitzgen would you be up for being a second pair of eyes here?

Feb 06 '23 16:02 alexcrichton

I fixed (4) in https://github.com/effect-handlers/wasm-tools/pull/38

Feb 06 '23 18:02 alexcrichton

I'm not aware of any current embedding of wasmparser where validation performance is of the utmost importance.

+cc @Robbepop. You've sent PRs improving wasmparser performance in the past, is wasmparser performance critical for wasmi?

Feb 07 '23 19:02 fitzgen

The representation of ValType uses some #[repr(packed)] tricks to be 4-bytes but this means that you can't have more than 65k function types in a module. Currently we allow up to one million types so this probably needs to be extended in the future. This notably affects the ref.func instruction currently where ref.func will not validate, even if the function-references proposal is disabled, if the type index is too large.

Have we checked what implementation limits others have? If V8/SM/JSC have the same limits we have than it isn't really an issue. But we should probably at least have our limit as high as the lowest of them.

Feb 07 '23 19:02 fitzgen

I'm not aware of any current embedding of wasmparser where validation performance is of the utmost importance.

+cc @Robbepop. You've sent PRs improving wasmparser performance in the past, is wasmparser performance critical for wasmi?

@fitzgen In wasmi the parsing and validation performance is kind of critical since it makes up for a good chunk of the total runtime especially in our use case at Parity were we run Wasm smart contracts with it. Those smart contracts are usually rather small Wasm blobs (5-50 kilobytes) and execute pretty quickly due to the fact that users have to pay for every cycle of computation.

Being an interpreter wasmi naturally is optimized for fast parsing, validation and translation speed at the cost of execution speed.

Feb 07 '23 19:02 Robbepop

The representation of ValType uses some #[repr(packed)] tricks to be 4-bytes but this means that you can't have more than 65k function types in a module. Currently we allow up to one million types so this probably needs to be extended in the future. This notably affects the ref.func instruction currently where ref.func will not validate, even if the function-references proposal is disabled, if the type index is too large.

Have we checked what implementation limits others have? If V8/SM/JSC have the same limits we have than it isn't really an issue. But we should probably at least have our limit as high as the lowest of them.

We (SpiderMonkey) have a limit of 1 million types. I believe the other engines are the same because that is the value in the JS-API spec.

For us, we eagerly canonicalize type definitions when parsing the type section. Then our type representation is 8-bytes on all platforms, containing a 8-bit type code, 1-bit nullability flag, and then 32 or 48-bit packed pointer to a type definition. We store a side hash table for turning the type definition pointer to a module local type index for error messages.

Feb 07 '23 20:02 eqrion

To be clear the (1) limitation is not the end of the world, if it only affects validation when function-references is enabled. In that case we can leave it as a "fix this before turning this on" feature. I think all that's necessary is what I outlined above, conditionally implementing the ref.func validation. Otherwise though I think it's clear that 65k isn't enough and we'll need to play more representation tricks with the type information.

@Robbepop would you happen to have benchmarks you'd be able to execute on-hand? It'd be great to see the impact of this specific PR on your embedding to see if the 20% slowdown in just validation that I measured locally is both reproducible and also to see what portion of your benchmark is slowed down as well. If not though, no worries!

Feb 07 '23 20:02 alexcrichton

@alexcrichton I ran a benchmark on our CI. Of interest are the translate_ benchmarks which show to have regressed by roughly 5% all in all with the wasmparser version of this PR. The benchmarks are a bit noisy unfortunately. https://github.com/paritytech/wasmi/pull/656#issuecomment-1421503362 Note that the translation benchmarks test the performance of parsing, validation and Wasm -> wasmi bytecode translation of Wasm modules, so Wasm validation is just a part of the whole test. Therefore it is expected that the 20% won't show up.

Feb 07 '23 21:02 Robbepop

Looking at the representation of ValType (and RefType and HeapType) I think we could do a bit better.

We need at least 20 bits to represent at least 1 million type indices.

If we represent ValType as a u32, use the first byte as a discriminant, and left the other three bytes (24 bits) available to each variant as they see fit, that is more than enough for our 1 million types.

Thoughts?

Feb 08 '23 23:02 fitzgen

I do think a representation like that would work, yeah, but I also want to avoid going too far down this rabbit hole just yet. I think @dhil and @CosineP have subsequent work based on this PR they're interested in making progress on so in the spirit of that I don't think that the 65k-limit is a blocker for this PR. I believe there's a fair amount of API design work to get a bit-packing API working since it won't be as straightforward as today's match used extensively throughout wasmparser and consumers, so separating out that change to a standalone PR I think is worthwhile.

Feb 08 '23 23:02 alexcrichton

Alex and I just talked offline about this PR and its performance regression. We agreed that ultimately the regression is acceptable. It is pretty much a fact of life that ValType is going to have to grow in size to accommodate future Wasm proposals, not just the func refs proposal but the GC proposal as well. There certainly is room for performance improvement in the crate and its validator, but it doesn't need to hold up this PR from landing, since it isn't something we can just change about this PR; it is going to need to be wide spread and touch a lot of wasmparser to clean up idioms all over the place and shrink sizes of various things.

Feb 08 '23 23:02 fitzgen

I would very much appreciate to have a wasmparser release without this PR merged and the fix for the floats feature so that wasmi can sit on it for a while until the performance regressions are hashed out eventually. (gives us time)

Concerning the ValType layout optimization: Wouldn't it be possible to have an optimized ValTypeOpt type that is used internally and only giving out the ValType type as a user facing element? If the conversioln is fast and the optimized ValTypeOpt is making things faster this could be a solution that does not introduce major breaking changes.

Feb 09 '23 07:02 Robbepop

Ok yeah if that works for y'all @Robbepop I'm happy to do that. I'm going to try to publish new versions today and I'll do that before merging this.

For the layout of ValType, I should clarify that I don't know if it's 100% responsible for the 15% performance regression. I know that it's inflated from the prior 1 bytes to 4 bytes in this PR and I think that's contributing to the regression, but I don't think it completely explains it. I have been having difficulty digging in and figure out what else to optimize here.

Feb 09 '23 20:02 alexcrichton