Introducing `Box`es in Qiskit
Summary
This RFC describes a new Box instruction that would be added to qiskit as a way to express groupings of instructions that can have data attached, can be sent up and down the stack, and can pass through transpilation. It provides details of its implementation and its interaction with other qiskit features, and it discusses the benefits of Box in key contexts such as twirling and mitigation
Thanks for the feedback @pedrorrivero !
I didn’t see a clear way to declare nested box structures without potentially running into "indentation hell." Are there any plans or thoughts on addressing this?
I'm wondering if you might be able to provide an example to guide us here? I can try to give a general answer, but it might not touch on what you want. Just like other existing instructions that own one or more blocks with their own scopes (switch, if, for, etc.), yes, the QuantumCircuit object will allow the construction of arbitrarily nestings so long as everything is well defined. However, this doesn't imply that any particular execution agent will be willing to interpret and execute such circuits. Just as the runtime primitives have validation for various things today, I expect they won't accept nested boxes of certain flavours.
Is it intended that annotations can enforce checks within the box? For instance, if I want to ensure that my box contains at most one operation per qubit, is there a mechanism to enforce such constraints?
Something will enforce these constraints, but this RFC doesn't specify whether an annotation will declare its own validation method, or whether some other entity will enforce these constraints pre-submission.
On point 1:
Being a ControlFlowOp, you'll be able to construct a Box instruction object manually with Box(body_qc, annotations=[]), and then append it to a circuit with qc.append(box, qubits=[...], clbits=[...]). When the control-flow operations were introduced to Qiskit, that was the only way, and in practice, it's super fiddly to do and get right. That's why the control-flow builder interface was introduced, but the old method is still available. You can of course also use all the other tools to construct QuantumCircuit - building components of the circuit and calling compose onto a larger one, writing the circuit (or parts of it) in OpenQASM 3 and qasm3.loading it in, etc.
On another note: you say "indentation hell", but a box is a logical scope - in all programming languages I'm familiar with, scopes are conventionally (or mandatorily) indicated by a layer of indentation, which is the same as with qc.box(): ... introduces. Is there something more than that that you're worried about?
On point 2:
Not all annotation validations might be able to work knowing only the local scope of the box that's just been created, so it would be limiting to mix in validation concerns along with construction ones. The validity of an annotation might depend on other boxes in the circuit, on the capabilities of the backend it's targeted for, etc, so I would suggest that "at construction time" isn't the best place for the checking. Part of the "backend extension" section above is getting towards the idea that it could one day be the domain of the transpiler and custom transpiler passes (like how the transpiler validates basis gates, coupling, etc), but we don't go all the way in this RFC - there's no need to try and legislate too far in advance yet.
I like this RFC! The only point that popped up while reading the RFC was on equivalency. I see that a user can specify an annotation that uniquely identifies a box by using e.g. uuid. I was wondering whether we wanted to enforce some kind of default behaviour for boxes that would allow follow-up transpiler passes to immediately identify equivalent boxes instead of relying on the user to correctly provide uuids. A user may define a box and append it to a circuit multiple times. If we have an efficient way of checking the equivalency of two boxes in the circuit, a transpiler pass may be able to reuse any optimizations performed for the first occurrence of a box. Without a default box identifier, a transpiler pass may first need to establish box equivalency by performing e.g. graph isomorphism checks which may offset the benefits of using cached optimizations.
Sebastian: the RFC is deliberately trying not to do that, with the whole "boxes can't be re-used verbatim" thing. It can very very occasionally work that the compilation of a virtual-circuit box could be re-used verbatim for another instance of the virtual-circuit box, but that depends on a whole bunch of things that aren't local to the box: the layout of the circuit, the routing, the instruction set of the backend target, etc. QuantumCircuit and the core transpiler need to be conservative, so an automatic marking trying to get at "this is the same box and must be compiled the same every time" would be nearly unusable - it'd have to be identical for every modification the transpiler might make, and that's not something we have algorithmic support for. Routing is the big one here, especially because it's non-local effects from the boxes that make them incompatible, but that then ties into your wants for optimisation: an optimisation isn't valid to be re-used if the box doesn't take place on the same hardware qubits.
The idea of "reusable optimisation" is explicitly punted from this RFC - that would be a separate "function call" sort of instruction. That's obviously really really interesting, but "reusability" is a separate concern to "box".
My understanding of this RFC was that boxes are reusable?
I think it would be up to the transpiler pass to decide what it does with equivalent boxes. Granted, a transpiler pass would often have to consider things outside of the box definition to make that decision but e.g. for synthesis on a homogeneous gate set, you could potentially incur a large runtime speedup when encountering m equivalent n-qubit boxes in your quantum circuit. Reusing peephole optimizations (or many other passes in the init stage of the default passmanager) within a box is another example that appears to be workable in the future.
On the other hand, I don't want to suggest any kind of scope creep and it appears that boxes are useful outside of these kinds of use cases.
There's some more discussion up here: https://github.com/Qiskit/RFCs/pull/76#discussion_r1915639844.
I think it would be up to the transpiler pass to decide what it does with equivalent boxes.
There are built-in transpiler passes, which (almost) always run that will break your definition of "equivalent" from the virtual-circuit perspective immediately. I totally agree that it would be great if we could re-use optimisations, but it's very much not trivial to do this.
I think you're thinking of equivalence and optimisations in quite high-level abstract terms, where we reason about the quantum hardware in very homogeneous terms. Having an "auto equivalence" might be useful here, but that's not related to box - it's about any high-level construct, like a multiplexor or a high-arity QFT as well. We already have a mechanism to mark a composite instruction as reused - it's to make a custom subclass of Instruction or the like. We don't do anything with that information yet, but the principle is there for future expansion, and then it'd auto apply to other high-arity / complex instructions too.
Once we've moved past abstract optimisations, the next thing that happens is layout and routing. Both of these map all instructions from the virtual nicely homogenous "algorithm designer's" space down to the physical hardware, where we lose almost all the homogeneity pretty immediately. This doesn't even require a heterogeneous basis set in the sense of "different 2q operations on different links" - the connectivity graph of the physical qubits and their error rates are already significant inhomogeneities for both routing and all subsequent optimisations. So now the equivalence of a group of instructions is that they have to have been routed onto the same hardware qubits in the same orders (to match error rates), or if errors aren't considered, then they still need to have been mapped to an isomorphic subgraph with equivalent swaps. We don't have any routing algorithm that can enforce that, so if reuse of the box structure implied that equivalence was required to this level, then the only thing box re-use could be for would be super low-level applications where you're putting the box on the same physical qubits, and so you're already likely to have done the vast amount of the work that you might want from the transpiler.
I think there is a use for the latter thing, including potential re-use, but I don't think it can work as the default setting. I'm interested in making it opt-in via something like built-in annotations, something like Verbatim (fail if layout/routing is required, no optimisation with), OptimizationLevel(x) (apply a different optimization level within the block), etc. That bit of the design isn't well-thought out in my mind (and doesn't need to be in MVP0), but it's definitely an area for future expansion.
I see that a user can specify an annotation that uniquely identifies a box by using e.g. uuid.
I think I may have unnecessarily injected some confusion into the PR by naming that annotation "uuid". The goal there is more narrow than to annotate the box and its contents as being uniquely universally identifiable, implying any two boxes sharing the uuid must be exactly equivalent.
The idea instead is that the annotation itself should be uuid, and maybe more properly uid, so that it can be used as a marker at execution time to attach external execution-time information external to the circuit, such as noise model injection. For this to work, there need not be a promise that the contents of all boxes sharing a uuid must be equal, only that each box with a particular uuid must be compatible with whatever they're attached to. In the cases I'm thinking of these compatibility constraints come down to something easy like qubit count, and can be done at validation time. It's true that built-in tooling and typical workflows will always have the behaviour of, say, only attaching a particular noise model to boxes with identical contents. But to demand this by (difficult to implement) contract would be overbearing.
This looks super useful.
Mutability: I think the data in an annotation should be immutable. In Python, you'd have trouble enforcing this. But in any case, it should be made clear whether the data is immutable; throughout Qiskit transpilation and perhaps other layers of the stack.
noop: I wonder if a better solution than noop could be found. What you want to convey is that the box includes certain resources. This data could also be passed explicitly when constructing the box. Then, no pass or other analysis tool would have to search for the noops. If this data is already stored in the box structure behind the scenes upon construction, all the more reason to have a direct way to specify it.
The RFC presumably allows noop to be available for any other purpose that users find.
If this is really what we want, fine. But I'm uncomfortable with it happening as byproduct of specifying the edges of a box.
Versions: Boxes and annotations may be produced and consumed in different places and times. And the annotation carries some data of non-trivial complexity. I expect that the format of some annotations will be modified after they are introduced. Making some allowance for versioning of boxes (or the annotations, really) would complicate the RFC. But putting off questions of versioning will cause headaches down the road.
Reusability. It looks like a box carries two substructures, one for the circuit data and one for "annotations". It's clear that the box, including its circuit data, cannot be reused. But I imagine some circuits might repeat a box with the same "annotations" many times. In Python, the natural way to do this, for convenience as well as memory efficiency, is to assign the annotation structure to a variable and then use this variable in each box to refer to the data. In the RFC, the word "self-contained" is used. But references to annotations are not really self-contained. Note that the question of immutability is important here. Typically, in programming languages, immutable elements can be deduplicated as an optimization without changing semantics.
The RFC already mentions identifying an annotation with a uid, for other purposes. If the annotation is immutable and may be specified by a reference (in Python, this is something like the id) then the uid is redundant. Maybe this is ok.
At some point, it might be useful to implement hierarchical "grouping", like drawing programs do. A group contains an arbitrary collection of elements. But I can't think of a use off the top of my head.
Control flow semantics: Do we really want methods like replace_blocks?
Python-centrism: In practice, annotations containing fields such as strings and lists of floats are straightforward in most languages and platforms. But allowing other Python objects in annotations would complicate this picture. Restricting annotations a bit might make the serialization story simpler and more secure, as well. Also, ControlFlowOp will be implemented in Rust eventually. So it's best to think about compatibility now.
I'm very much onboard with the client-side validation. And more broadly, I'd consider specifying some kind of structure and semantics for annotation "metadata" (or "tags") as opposed to the more free-form annotation data. I mean things like "cannot ignore" that are discussed in the RFC.
This can be merged now, no?
@blakejohnson yes, we are happy with it being merged