Design question: How to make source generation more readable
The current design of using a "source builder" has a draw back that it makes the code harder to read.
self.src.ts("Record<");
self.print_ty(iface, &r.fields[0].ty);
self.src.ts(", ");
self.print_ty(iface, &r.fields[1].ty);
self.src.ts(">");
vs
let first = record.fields[0].ty;
let second = record.fields[1].ty;
self.src.ts(&format!("Record<{first},{second}>"))
The latter assumes that a custom Display for Type was implemented. Obviously this would go against the current design.
So my question is if there is a middle ground? A macro like format! but custom to this design.
to_src!( self.src.ts, iface, "Record<{first},{second}>")
// which becomes
self.src.ts("Record<");
self.print_ty(iface, &first);
self.src.ts(", ");
self.print_ty(iface, &second);
self.src.ts(">");
If you find this macro useful, what is your advice for writing it macro_rules or a proc_macro? I'm leaning toward the latter.
I agree that the source code generation isn't so readable right now, but I would prefer to not add a layer of indirection through a procedural macro because that seems like it would run the risk of complicating things even further. I'm all for better ways to design all this but I think making a whole new proc-macro crate is a bit overkill for this.
Good point. Was the original design an optimization? Why not have have the generator trait's methods return strings?
The way it is right now is largely just how I ended up making it in the first place. I originally wrote it this way in wasm-bindgen as well. There's not necessarily a reason one way or another, although I do believe it's theoreteically more efficient to build a string rather than dealing with lots of string fragments.
Well given the efficiency Rust already provides and small footprint most wit files will have, do you think it's worth a rewrite?
If so, I'd give it a go. Would be helpful in other contexts to get the string from a type.
For me it's mostly a matter of ergonomics because as you say the performance isn't the most critical in this context. That being said having written code generators that return strings from most methods I find the style of "build into a string" is easier to work with and more pleasant as it scales. In that sense I don't think in isolation, even irrespective of performance, that I would choose an alternative design.
In https://github.com/bytecodealliance/wit-bindgen/issues/170#issuecomment-1082286008 you mentioned that it may be a good time to revisit wit-bindgen's code generation. Are there any patterns or abstractions you might want to extract out of the code or small incremental wins to be made?
I know I was/am having trouble implementing my wit-bindgen-wasm3 port because I don't fully understand the nuances of the code generated by wit-bindgen, so I would be interested in helping if you can point me in the right direction.
I don't personally have any ideas myself, I find this sort of problem inherently complex because wit-bindgen is basically a compiler and structuring compilers well is not a trivial task. Historically I've found that it's easy to get trapped in "local minima" where one thing is nicer at the cost of many others. That's probably where wit-bindgen is at right now since I mostly just wrote it as I went along, but I don't know how best to get it over to a different spot that is easier to understand and modify.
I'm not sure if it helps, but when I need to write a proc-macro I'll normally go through a process similar to the one outlined in this article from Ferrous Systems:
- Parse the inputs (already done by
Interface::from_file()and friends) - Construct an intermediate representation which contains the logical components I want to generate code for, but is agnostic of the way it will be rendered
- Write various
render_xxx()functions which usequote!()to generate aTokenStreamfor each component, where the tokens for parent components are constructed by composing the tokens returned by each child element'srender_xxx()function. State specific to how things are represented (e.g. options or a namespace table containing the variables active in the current scope) might be passed around as function arguments.
Something I like about the approach is it takes a more declarative approach instead of procedurally generating code, and it's very easy to write unit tests to make sure a particular part of the generated code is what you expect.
In some sense that's sort of what happens today, although the intermediate representation is currently shared amongst all language binding generation which means that it's not as close to the final product as it could be. This does, however, avoid the need to define a new intermediate representation for each language we're generating bindings for, additionally ensuring that language binding generation is consistent (e.g. only Rust can use TokenStream, other languages can't necessarily)
I'm going to close this because I don't think it's particularly actionable at this time. We're in the process of splitting apart this repo with host generators going to live in their own homes instead of having everything in a single repository. In the fullness of time it's possible for each generator to be written entirely differently which I think would encourage multiple ways of writing/etc and hopefully can be more readable over time too.