Limit operand to a set of physical registers
It seems like there are currently two options when expressing which registers an Operand can use:
- use
reg_def/reg_use/..., in this case the operand can be allocated to any physical register. - use
reg_fixed_def/reg_fixed_use, in this case the operand can only be a single physical register.
This does not seem to cover the case where only of a limited set of physical registers can be used as an operand.
This is often the case on x86 (32-bit), where some of the general purpose registers don't expose their lower 8 bits as a pseudoregister. This means that setcc and other instructions dealing with a byte can only use the first 4 registers, not eg. esi.
My current workaround is to always use fixed operands for these instructions, but that is a bit too limiting. The other solution is to completely remove the additional registers from the MachineEnv, but that has even bigger drawbacks. Is there a better solution? It seems none of the ISAs supported by wasmtime have this issue, so I can't steal any ideas there.
This is an interesting requirement, and it would certainly be useful, I agree!
It seems technically feasible-ish, and the general solution would be to (i) modify the Requirement lattice to have a notion of register sets (with the appropriate meet function, namely set-intersection), that sublattice embedded in the overall lattice; and (ii) modify the traversal over candidate registers to visit only those in the set.
There may be some subtle forward-progress issues with that; I'm not sure and would have to think a bit more. Also there are potentially interactions with the way that multiple conflicting requirements on one vreg at one time are handled (the "multi-fixed-reg fixup" logic).
Anyway, I invite anyone to experiment with this; I don't have the bandwidth or motivating use-case (from things I'm tasked to work on) to be able to spend time on it unfortunately but I'm happy to point to the appropriate parts of the implementation and/or review code!
One issue I came across when thinking about this some more is the size of Operand, which currently uses bit-packing to fit everything into 32 bits. If a limited set of registers is added as an option, that basically requires adding a PRegSet-sized amount of information, which is 128 bits. Maybe it's not that bad since we only need to support one register class, but it's still going to increase the size of Operand (and related types) by a lot.
Another solution is to only allow a predefined set of register sets to be used as defined in MachineEnv, which can then be referred to with a short index.
I'll think about this some more, and look into the internals of regalloc2 a bit so see how much work this feature ends up being.
Another use case for this would be supporting the RISCV C extension where instructions can only reference 8 registers (either integer or floating point) instead of the 32 available in the regular ISA.
Actually this could completely replace register classes (and fix #47), just annotate each def/usage with the right set of physical registers. Of course that still requires figuring out the bit packing. And maybe it's not very efficient since this would artificially entangle the integer and float register allocation problems.
In the end it's possible we might need to grow Operand slightly, though we should be careful doing so. It might be possible to experiment with e.g. 40 bits by keeping separate arrays of u32 and u8 (this is extraordinarily ugly, yes, but we can wrap it in accessors... and it would lead to better code than a multiply-by-5 packed array indexing). One could put lesser-accessed bits in the u8 part, maybe. Then maybe we could have a notion of a fixed number of subsets of all registers, indexed by a few bits (4? 8?). This is functionally equivalent to the "overlapping register classes" idea that many other allocators have.