WIP: renderer: micro-optimize CPU culling and other things
This code is a hot spot, we better avoid computing useless things if we can return early and do bitwise operations instead of relying on the branch prediction being right.
Commits are meant to be squashed, they are just small steps I did one by one for testing I was not breaking anything.
While I'm at it I'm also do minor improvements there and there in tr_world.cpp.
Before:
After:
The engine now spends 7.5% of the time in RE_RenderScene instead of 9%.
BoxOnPlaneSide is also used in other functions (outside of the screenshot).
Edit: This is a release-like RelWithDebInfo build with LTO enabled in both cases.
The screenshots were done over #1043 because this branch was first written over it, and the code looks to already be bit faster when #1043 is merged:
- https://github.com/DaemonEngine/Daemon/pull/1043
The current PR is bringing some extra performance boost over it. While the small performance boost of #1043 was not the purpose of that other PR, this one was, and it looks like both are useful for gaining extra CPU performance small steps after small steps.
I tested this branch rebased over for-0.55.0/sync branch and I also see a speed bump.
Before:
After:
So, I'll start to clean-up things and submit the patches for merging in a near future.
Edit: the percentage is higher than the ones in records from previous comments because this time I used a lower graphics preset so less time is spent on other parts of the code.