Implement GPU lane usage profiling
Add r_profilerRenderSubGroups and r_profilerRenderSubGroupsMode.
If r_profilerRenderSubGroups is enabled and the required subgroup extensions are suported, lightMapping and generic shaders will change their output colour to dim red->bright red->dim green->bright green based on how many active lanes in either the VS or FS there are.
Modes: 0: VS opaque 1: VS transparent 2: VS all 3: FS opaque 4: FS transparent 5: FS all
Also improved shader post-processing to leave global uniforms in-place to make this work, and made it consider comments on the same line.
More red = ~more scary~ worse (less active lanes), more green = better (more active lanes)
This is an example of using mode 0 on plat23:
Mode 2:
The surfaces resulting from rendering commands with few triangles are the more red ones.
Mode 3:
Mode 5:
The triangle edges, for example, have more inactive lanes.
Dense vegetation on map moonbase, VS lanes are mostly good:
But there's a lot of wasted FS lanes:
I've originally thought of doing this because of something that caught my attention in one of the error logs that @illwieckz has sent me for some shader change or another. Lanes can become inactive for many reasons: not enough work for the SIMD unit to do, triangle edges, too much register pressure etc... The log illwieckz sent me had a line that said something along the lines of only 1 invocation being active at maximum, so it'd be interesting to see the results of using this on Mesa.
Requires subgroup extensions.
What happens in a multi-stage shader? Would you only be able to see the last stage?
Yeah, only the last stage. I can add a cvar to make it skip all stages except one though (or a specific material ID, since one material can be e. g. first stage on one surface, and second stage on another surface).
What happens in a multi-stage shader? Would you only be able to see the last stage?
I've now added r_profilerRenderSubGroupsStage. When set to 0 or more, it will only render the specific stage/material, otherwise if set to -1 it will render all stages on top of each other (like it did before I added this cvar).
Added a missing check for depth pre-pass surfaces in tr_shade.cpp.
LGTM