coat Improvement: Backend-independent code generation

Hello,

First of all I'd like to share a very positive feedback on the project. Last semester I completed this project where coat was used to compile code for ReGex matching. The results were really impressive as can be seen on the final report.

I'm currently working on a tiny query engine that uses coat for code generation. My impressions are that even though the framework is very easy to use, it has the following weak-spots:

**Too much templated code pollutes the engine source: ** Every part of the engine which uses coat needs to be templated. This template-heavy code ends up propagating to the rest of the engine until eventually a great part of engine code becomes header-only.
**Adaptive execution with coat is a bit cumbersome: ** When doing adaptive execution, one needs two different coat::Function objects, which go through the exact same code generation steps, but emit into different backends. Ideally, the user would want to do something like f1 = fn.finalize(backend_1); f2 = fn.finalize(backend_2); and only inject the back-end at the end.

One solution would be to create an analogous to UMBRA IR (described here) that first generates an intermediate representation, and then based on the back-end emits machine code.

This solves both problems because the code to generate the intermediate representation is template-free and one can inject different runtimes to translate the intermediate representation to machine code.

I want to implement this feature. Would that be integrated into coat? Do you have any thoughts to share about it? Should we discuss more on the exact design?

Looking forward for your feedback

Jan 15 '21 15:01 EduardoGRocha

Hello,

thanks for the interest and trying it out in one of your projects.

I'm well aware of the template problem, but there is no simple answer to this. Just keep in mind that it is not beneficial to generate code for "everything". A lot of things can be implemented in normal C++ code and be called from the query-specific generated code without any performance penalties. It's just a normal function call. Only generate code for query-specific parts which are worth the effort. The compilation latency will be lower as well if you limit the amount of generated code.

The main idea of COAT is to be a very thin layer on top of a JIT API, not yet another IR layer. It does not have an internal representation of the generated code. It basically just forwards operator calls in C++ to respective API calls of AsmJit or LLVM. It is surprisingly simple. And that was the main point I wanted to proof with this prototype. We do not need to invent and implement another language layer like UMBRA IR which is a lot of work. A small and simple library like COAT is good enough, for the most part.

The downside of the whole approach is the eager binding to a backend at compile-time of the C++ project. This leads to a bit of template hell if you want to re-use code fragments for multiple backends. However, it is doable as you can see here: https://github.com/tetzank/sigmod18contest

Other options: You could just add another backend to COAT which emits UMBRA IR or similar. Then you can do whatever you want with the assembled IR, like transforming it through a JIT assembler like AsmJit or an optimizing compiler like LLVM. For COAT it will be just another backend.

If you only want to achieve adaptive execution (first AsmJit, then LLVM), you could also just combine both existing backends and assemble the code simultaneously. AsmJit and LLVM at first only gather all instructions before the compilation is started during finalization. In this combined backend, you forward calls to AsmJit and LLVM when generating the code. During finalization, you first finalize with AsmJit, start executing with the returned function and in the background start the compilation with LLVM, switch to the optimized function when ready. Just a wild idea, not sure if it is feasible.

PS: Thanks for the heads up about the new UMBRA paper. Until now, I was not aware that they added a JIT assembler to their system to overcome the issues with compilation latency. Good to see that it works in a full system.

Jan 20 '21 23:01 tetzank

I agree that specially for small use-cases the current approach works better as the code looks just like normal C++ code (For my engine I would usually write the original code in C++ and then replace constructs such as "while" by "loop::while" and it would directly work). However, for larger projects, relying too much on a header-only library is not a good idea as engine compile time explodes.

I'll try out the approach of adding Umbra IR as another backend and see where it leads me. In principle I don't see any problem in having another IR as this would be totally transparent to the user and the extra translation overhead is negligible.

I'm interested in developing the framework because in my opinion anyone who does code generation should also do adaptive execution and frameworks such as coat or flying start are essential for that.

Feb 09 '21 00:02 EduardoGRocha

You might be interested in the buildit approach, they only require small changes to the types of variables and support multiple stages instead of just 2. https://github.com/BuildIt-lang/buildit

For a useful example of how to actually compile and run the output, see https://github.com/BuildIt-lang/buildit_regex

Apr 02 '24 04:04 bftjoe