Track cache misses from perf counters.
Cache misses should correlate strongly with DRAM accesses, so they should give us a good measure of "time wasted because of memory usage/non-locality".
Not sure why we track "faults", but if that is available, cache misses would be too, I presume?
faults often correlates with max-rss, so I find it moderately useful to see when they go down in tandem. Cache miss rates probably wouldn't correlate in the same way; I suspect I personally wouldn't find them all that useful.
faults correlate with allocations, maybe, but not with "total DRAM traffic", which is where most of the time is spent, outside of what the instruction count measures.
So (instruction_count, cache_miss_count) should strongly correlate with total time spent in userspace, but be more stable. If we also disable sources of randomness, like ASLR, and pin the compiler to its own core, those numbers should be fairly repeatable.
In theory I believe this is the list of performance counters on the collection machine -- @eddyb , would you be willing to look through that and select a few that would fit the "cache misses" bit? I suspect that maybe "cache-misses" is enough though we may want to actually split into more specific data...
My guess would be cache-misses + LLC-load-misses, but it might be a good idea to grab the data from across some rustc change where we expect a drop/increase in memory utilization (size change of commonly used type would probably work).
Comparing L1-dcache-load-misses vs L1-dcache-loads could be useful for estimating data locality, but maybe not at the scale of a compiler.
https://gist.github.com/Mark-Simulacrum/151b4948d6bd1ab86b5c1d9b36ba64de is the current output of perf list on the collector.
Cache misses are now tracked and stored in the DB. And we're running out of HW counters (I think that there's only one left now), so I'm not sure if it's worth it to add some more detailed cache miss metric here.
Cache misses are now tracked, so I think that this can be closed.