circt [arcilator] Introduce simulation orchestration subdialect

This introduces a couple new operations to drive simulation from within an MLIR program, next to the models themselves. This can be lowered by Arcilator to create LLVM functions that run the simulation. arc.sim.emit can be used to print values during the simulation. The semantics of this operation are to send a value to the simulation driver, but for now in practice this amounts to being lowered into a libc print to stdout.

Feb 14 '24 17:02 Moxinilian

@fabianschuiki How do you want to go about this? The verifier is not allowed to check beyond its own operation. The only way would be to embed the model structure in the operation, but that’s super impractical and would not solve the model resolution part.

I think it is fine to only verify at lowering, right? It makes sense to have this logic exist even though the model is not concretized yet.

Feb 15 '24 18:02 Moxinilian

There's this pattern with symbol user verification where your op can resolve a symbol and verify invariants against it. I see potentially two ways to do this:

Embed the port structure into !sim.instance (maybe through the new HW module type thingy), then have sim.instantiate resolve the module name as a symbol and check the port structure is correct. sim.set_input and sim.get_port could then simply check against the type of the instance operand.
Make sim.{instantiate,set_input,get_port} all resolve the module name as a symbol (taken from the type -- I think I saw the CIL work to a similar thing for C++ classes and members) and then use the actual module to check its things.

But in general I think it's desirable to capture malformed IR early on, not only at a late stage during lowering. Especially if its structurally malformed, as in accessing non-existing ports or modules altogether.

@maerhart What's your take on this?

Feb 15 '24 19:02 fabianschuiki

It's really cool to see all this progress in this direction! Can't wait to implement the first arcilator integration test using this infrastructure.

I agree with @fabianschuiki that these things should not be checked in the lowering pass.

Regarding the two possibilities for implementing this, I would probably go with the second one for the following reasons:

Having all the ports in the type is fine, but if we also want to access registers, we would need to add those as well, which leads to a very big type struct (we could solve the assembly format readability with type aliases though)
It potentially allows us to retain the instance hierarchy of registers/wires (not sure if we need/want this)
This type can be shared with the allocation pass of the arcilator pipeline to specify the memory layout; we can then get rid of the hardcoded integer offsets and use those get/set operations there as well. It is important there that the memory layout can be easily changed/optimized which is not the case if all the types have to be changed. The indirection via a symbol solves this, but at the cost of slower verifiers.
Materializing the layout like this in the IR enables us to get rid of the custom data structures in ModelInfo.h and simplifies the printing to JSON.

The first approach also has its advantages, though:

Easier to implement
More efficient verifiers

It could roughly look like the following (I haven't thought a lot about it, and I haven't checked how CIL does it, but just to give an idea of what I'm talking about, maybe @fabianschuiki as some ideas/comments on how this should look like):

arc.model @modelName layout @modelNameLayout {...}
arc.model_layout @modelNameLayout {
  arc.layout_entry "input0" input : i32 // We could also have separate operations for input, output, register, etc.
  arc.layout_entry "reg0" register : i32 // those entries could additionally specify whether they are private (i.e. not accessible from the testbench), read only, write only, or read/write according to the tap operations which can enable more aggressive optimization
  arc.layout_entry "memory" memory : !arc.memory<...>
  arc.layout_entry "wire0" wire : i32
  arc.layout_entry "out0" output : i32
  arc.layout_entry padding : i8 // supporting manual padding could help optimizing cache accesses
  // names are verified to be unique
}
func.func @main() {
  %model_state = arc.sim.allocate_model @modelNameLayout
  arc.sim.set_input %model_state["j"] = %c : !arc.model<@modelNameLayout>, i8
  arc.sim.step %model_state with @modelName : !arc.model<@modelNameLayout> // verifies that @modelName accepts @modelNameLayout
}

We could also specify the layout in a separate region of the arc.model, haven't thought about the trade-offs regarding this yet.

This would probably require 2 additional PRs: one adding the operations to represent the layout, and another one adding logic to materialize those operations during the arcilator pipeline.

Usually, it's preferred to add the verifiers along with the operations. However, given the complexity, I'd personally also be fine with merging this PR without and adding them in a follow-up PR once those two other PRs are in as well (if we can be sure the verifiers will be added within a reasonable amount of time).

Feb 15 '24 22:02 maerhart

@maerhart I like your suggestion a lot. But you're also right in that it's a pretty big code change that we'd have to sequence before this PR here in order to get the verifiers going. Agreed on getting this PR landed as it's highly beneficial, and then tackle improved verifiers as we land improved representation of the model's data layout as separate ops.

Feb 15 '24 22:02 fabianschuiki

So it is a bit silly but I genuinely do not believe verifiers are allowed to look at anything other than the local operation. We had an extensive conversation about this on the MLIR Discord server and the problem is that it is frequent for rewrite pattern drivers to run verifiers in the context of passes, which themselves can be parallel and thus looking at things outside the operation potentially creates data races.

The only place where I am absolutely sure it is legal is in the lowering pass, as we ensure it is ran single-threaded by having it be a pass on the ModuleOp.

Feb 15 '24 22:02 Moxinilian

Since the !arc.sim.instance type will have a symbol parameter, every op with such a type will implement the symbolUser interface which can look up symbol-defining ops in verifySymbolUses. It works the same way as call operations verify that the arguments match the types of the function it calls.

Feb 15 '24 22:02 maerhart

So it is a bit silly but I genuinely do not believe verifiers are allowed to look at anything other than the local operation.

Yeah, its true. I think most people are aware by now, but we have lots of verifiers that are breaking that rule. It has a nice section in the MLIR Developer guide, and the section for adding custom verifiers to ops explains how to properly verify regions.

Feb 15 '24 22:02 youngar

The symbol use verifier can still access the op referenced by the symbol though, right?

Feb 15 '24 23:02 fabianschuiki

The symbol use verifier can still access the op referenced by the symbol though, right?

Yes, the verifySymbolUses function is called in the region verifier of the operation that defines the symbol table. In our case, that would be the builtin.module.

Feb 15 '24 23:02 maerhart

Okay perfect, so that's a good spot to actually go resolve the model name to the model op and make sure that it exists (in instantiate) and that the corresponding ports exist (set_input and get_port).

Feb 15 '24 23:02 fabianschuiki

Ah yes! I forgot about region verification. In that case yes, I'll set up the symbol-based thing.

Feb 15 '24 23:02 Moxinilian

So there is a problem with the symbol approach: if the validity of ports is enforced at all times, this means simulation logic cannot point to high level things that would potentially be generated later.

One way to go around this would be to have high level operations implement some interface in which they describe the ports the expose. For example, hw.module would. But while it is easy for input and output, for registers it seems like an unreasonable amount of analysis to do at verification time (you would have to find all the registers in the module body recursively).

In contrast, detecting it at lowering time makes sure all the ports are fixed and defined when we verify them. I agree it is not elegant at all but I am having a hard time finding a way around this.

A middle ground could be to only accept inputs and outputs, but not only would that be fairly restrictive, it would also not work for high level things that generate multiple modules.

Feb 16 '24 10:02 Moxinilian

@fabianschuiki

Benchmark run successful!
----------------------------------------
412348 cycles total
vtor: 63630.1 Hz
arcs: 477011 Hz

Feb 27 '24 20:02 Moxinilian