proposal(optimization): create a more compact ZIR representation which avoids storing instruction indices

Open mlugg opened this issue 2 years ago • 0 comments

Note: all new names in this proposal are very open to bikeshedding, I simply came up with them because I needed names.

Problem

Since it's cached, ZIR is arbitrarily long-lived. It's also the majority of Zig's memory usage (excluding LLVM) for compilations which don't make excessive use of comptime. For this reason, it's generally a good idea to try and compact its representation as much as possible.

All in all, we've done pretty well on this, with a nice compact DOD representation. However, one thing stands out as a needless inefficiency. For every body (a block, block_inline, loop, etc), the indices of every instruction within it are stored in extra. Since virtually every instruction is in a body, that means we use 4 * instructions.len bytes of ZIR for every file just storing these indices. For Sema.zig, that's around 25% of its total ZIR size, and it's reasonable to assume that this number will approximately hold for most code.

Proposed Solution

There's an obvious alternative here. In ZIR, rather than having instructions be ordered arbitrarily, place all instructions in a body directly after the instruction containing them, in order. That way, the body only needs to store a length, representing the number of instructions in it. This would presumably reduce the size of ZIR by around 20-25% across the board, which is a major improvement, likely resulting in speedups in semantic analysis. It'd also improve cache locality, since adjacent instructions (which are generally analyzed approximately in order) will be adjacent in memory.

The reason we don't currently have this setup is that it would significantly complicate AstGen. Some of the instruction wrangling logic there can already be a bit convoluted: if we also had an enforced order of instructions, it'd become more complex.

For that reason, I propose splitting ZIR into two representations:

ZIR is as it is today.
CZIR (Compacted Zig Intermediate Representation) is similar to ZIR, but with instructions ordered as described above and bodies modified to just store their length.

The idea here is that AstGen would still emit ZIR (in order to keep it relatively simple), but a new pass - let's call it Compact - would then translate this ZIR into CZIR. There might be similar enhancements we can come up with, such as DCE using some form of liveness analysis, which this pass can also be responsible for. We would then cache the CZIR, and have Sema operate directly on it.

Before implementing this, since it'd require some fairly large changes to Sema, we should try and get an estimate for how it would affect performance: the extra pipeline complexity is probably not worthwhile if it only amounts to memory savings. This isn't easy to do, but here are a couple of ideas:

Research whether we can profile memory access and cache misses to a specific memory region (the ZIR extra data), since we can assume around half of this would be eliminated by this change.
Implement a skeleton of Sema's logic which essentially just does instruction traversal (randomizing branches etc), and compare its running time on old ZIR and new CZIR.

Jun 14 '23 22:06 mlugg