binaryen TypeBuilder woes

Say a compiler uses Binaryen as its backend, and the compiler reuses as much as possible from Binaryen, incrementally generating IR nodes and types while compiling a program. Now say there is the following simple program:

class A {}
class B extends A {}
new A();
new B();

Upon encountering new A(), the compiler generates a struct.new instruction and uses a TypeBuilder to build the respective heap type of A for use with the instruction. This one is simple, since there's just a single slot with the structure of A, and A has no super type. It's not yet known whether B will be used, and the compiler doesn't register a heap type for B eagerly.

Upon encountering new B() later, the compiler again generates a struct.new instruction and uses another TypeBuilder to build the respective heap type of B. This time, it populates the TypeBuilder with both the supertype A and the subtype B to establish the subtyping relationship.

In isorecursive mode, incrementally building types like this works fine, since if types are structurally identical, a canonical heap type is returned (iiuc). In nominal mode, however, incrementally building types like this behaves differently, in that upon each build fresh types are created, so there are both an A and a B <: A', with mismatching As.

To avoid this problem, I figured that I might well need an entire separate IR, so code can be generated before a final TypeBuilder is used to build the full nominal hierarchy in one go, only then translating the separate IR to Binaryen IR. Now, I'd like to use nominal mode but creating a separate IR seems unfortunate, as it would also require reinventing lots of wheels like an interpreter and so on, with the result that using Binaryen as a backend doesn't make a lot of sense in the first place anymore.

Questions: Is it actually a valid use case that I want to use nominal instead of isorecursive mode? And if so, any ideas how nominal mode can be supported in incremental codegen like this as well, without reinventing half of Binaryen? :)

(I guess this didn't come up earlier since, if Binaryen is solely used as an optimizer, this is a non-issue.)

Some ideas:

When creating a TypeBuilder, allow adding already built types to it if a fresh type is not desired
Allow reusing the same TypeBuilder forever, not destroying it when types are built
Integrate type building more closely and have a finalization step at the end (but that's probably much more work)

Sep 15 '22 00:09 dcodeIO

@tlively can answer this best, but another option that might (not sure) be relevant for such compilers:

Run two passes, in the first of which you collect type information but emit no code. Then you create the types. Then the second pass emits code and uses the types already built.

Sep 15 '22 15:09 kripken

The other option is about what led me to consider a separate IR. In complex cases, say when there is inheritance involving generics, where concrete type parameters are only known when code is as-good-as compiled already (i.e. class C<T,U> extends B<T> extends A), one will likely want the full information already, which would then live on the AST, that then becomes not unlike a separate IR. So far, we've avoided that exactly for the reason that generating Binaryen IR is cheap, so it can as well be done right away, with the added benefit that we can have maximum reuse of what Binaryen already provides. There is also the aspect that sometimes, to know what a type parameter is concretely, some amount of precompute will have to be performed. When storing the info on the AST-then-IR, that'd then lead to a separate AST-then-IR interpreter.

Sep 15 '22 15:09 dcodeIO

I see. Yeah, maybe that does lead to a separate IR. I was hoping perhaps a first pass could just build a "type IR", then between the passes build up a map of expression* => type, then use that map in the second pass. But maybe that "type IR" is fairly complex...

Sep 15 '22 16:09 kripken

One option similar to the two-pass approach would be to take the most recently generated version of A and backpatch all the previous uses of A to use the new version instead. But that would get expensive if you end up generating a lot of versions of the same type.

Another option is to give the type builder the ability to set arbitrary already-created types as the supertype, not just other newly-created types. This should "just work" for both nominal and isorecursive modes. @dcodeIO, would you be interested in making a PR that replaces TypeBuilder::setSubType(size_t i, size_t j) to TypeBuilder::setSubType(size_t i, HeapType super)?

Sep 15 '22 17:09 tlively

Another option is to give the type builder the ability to set arbitrary already-created types as the supertype, not just other newly-created types. This should "just work" for both nominal and isorecursive modes. @dcodeIO, would you be interested in making a PR that replaces TypeBuilder::setSubType(size_t i, size_t j) to TypeBuilder::setSubType(size_t i, HeapType super)?

This sounds like about what I need, but would ultimately also include using existing types as field types of structs, element types of arrays, if I'm not missing something. Would, perhaps, an API that allows setting / using existing types as part of a type builder, be possible, and does it seem like that would cover all the cases to you as well?

Edit: Oh, I see that TypeBuilderSetStructType etc. already take BinaryenTypes, so your suggestion might already be sufficient, if these BinaryenTypes can be existing types.

Sep 15 '22 17:09 dcodeIO

Yeah, TypeBuilder already allows existing types to be used as fields in the new types. Only supertypes are special in that they have to be other temporary types right now, but there's no real reason for that restriction.

Sep 15 '22 17:09 tlively