Minor interface change proposal
I'd like to make the following two minor changes to the interface:
---- (1) ---- [RA run time speedup support] ----
The client should tell us how many virtual registers are used in the incoming vcode. Or more precisely, it should tell us the index of the highest-numbered vreg + 1. This is so as to make it possible to pre-allocate the bit vectors needed for dense sets of vregs at the final required size. This will be important when we come to move to a better Set representation; and the day for that is coming closer. Note that from the input we already know (for example) the number of blocks in the input, so preallocation for those is already possible; but currently vregs remains an exception.
---- (2) ---- [Reducing saves/restores in the prologue/epilogue for leaf fns] ----
The client should be able to optionally tell us that "this function is a leaf function". Currently we have the convention that the registers listed first (in each class) in the Universe are callee saves registers, and at least the BT algorithm uses those in preference -- it literally scans forward through the class, looking for the first available candidate. This is a heuristic hack that has the beneficial effect of reducing spilling around calls.
However, it has a bad effect for leaf functions: it means leaf functions must save and restore callee-saves registers in their prologues/epilogues, yet there is no "win" of reduced spilling around calls to offset that (since they are leaf functions). Better would be, for a leaf function, to scan backwards through the candidates, hence selecting caller-saves registers in preference, in the hope of avoiding using callee-saves regs at all. Then the existing prologue-construction machinery will omit saves/restores of them. Given that leaf functions are sometimes small, this might be a significant (and easy) win.
The client's instruction selector can easily enough keep a boolean which it sets to true if it emits a call insn in the vcode. No need for a separate scan. So it's zero (compiler run time) cost on the client's side. Clients that are unable or unwilling to do this can say "no this isn't a leaf function" -- the current status quo -- and then will miss out on this opportunity, but the allocation will still be correct.