Daemon icon indicating copy to clipboard operation
Daemon copied to clipboard

Testing branches for Reaper

Open illwieckz opened this issue 1 year ago • 7 comments

Testing: https://github.com/VReaperV/Daemon/tree/material-stages-tex

System:

GPU: AMD Radeon PRO W7600 CPU: AMD Ryzen Threadripper PRO 3955WX resolution: 3840×2160 preset: ultra

Framerate on default spectator scenes:

  default material tex
plat23 433 354 360
metro 672 480 483
habitat 435 ☠️ 375
station12 108 221 234

illwieckz avatar Nov 04 '24 19:11 illwieckz

Hmm, that's interesting as I got slightly lower fps on metro and habitat with https://github.com/VReaperV/Daemon/tree/material-stages-tex. I'm guessing it's down to a difference in how the drivers are handling the respective buffers.

VReaperV avatar Nov 04 '24 21:11 VReaperV

It looks like the perceived slowdown I was getting was actually due to bugs on master, now I get same or higher fps on the branch above.

VReaperV avatar Nov 04 '24 22:11 VReaperV

Testing: https://github.com/VReaperV/Daemon/tree/test-no-multidraw

  plat23 default plat23 359 290 42 176 -7 habitat default
no-multidraw no-material 367 460 261
multidraw no-material 360 452 270
multidraw material 302 350 321

The difference betwee multidraw or not can be noise.

illwieckz avatar Nov 08 '24 20:11 illwieckz

The difference betwee multidraw or not can be noise.

Yes, I redone “multridraw no-material” with plat23, and now it is:

  plat23 default plat23 359 290 42 176 -7 habitat default
no multidraw no material 367 460 261
multidraw no material 369 461 270
no multidraw material 302 350 321

illwieckz avatar Nov 08 '24 20:11 illwieckz

Hmm, interesting, I got slightly better performance with the test-no-multidraw branch, but maybe that was just a fluke.

It's interesting that habitat now shows better performance with material system than otherwise, compared to the first test here. Probably due to the fixes I made earlier.

VReaperV avatar Nov 08 '24 22:11 VReaperV

Oh, btw @illwieckz , what result do you get on master/test-no-multidraw without r_materialSystem, and on master with r_materialSystem, while using r_profilerRenderSubGroups on? I'd be interested in a screenshot from the plat23 defaut view.

VReaperV avatar Nov 08 '24 22:11 VReaperV

I've been thinking that by quantising the stage data and offloading textures to a buffer with a fixed layout might improve this further.

For context, right now each drawSurf gets its own copy of the surface data in the buffer. This means that there's a lot of data being duplicated. Additionally, it currently spans 128b and 192b for generic and lightMapping shaders, which are 2 of the most abundant ones, which means the former can only fit 0.5 or 1 in a typical cache line, while the latter will overfetch. And increases bandwidth usage for updating this data. It also makes merging surfaces into one draw command impossible (unless switching to Vulkan, or using an Nvidia extension which didn't even work in that regard on my end).

The reason each surface copies its data is because (and I tried just storing data per-stage first instead) some of the data: lightmap, deluxemap and light factor, is per-surface. The https://github.com/VReaperV/Daemon/tree/material-stages-tex branch offloads some of the data to a different buffer to workaround this issue. However, after looking at the shaders and uniforms, I believe I can fit all of the generic and lightMapping shader stage data into 8 and 20 bytes per stage respectively, while storing the textures in a different, fixed-layout buffer. The stage can then even be put into a uniform buffer, which might work a little faster. 16 bits could be used to index it, with the remaining 16 bits used to store light factor and an index to textures and lightmap/deluxemap. Light factor can even be just 1 bit since it's always either 1.0 or map light factor, which can be set as a global uniform.

Only the texture index would then prevent merging different surfaces (other than having a different material, that is), since textures can only be indexed with a dynamically uniform value. From my testing it seems this should allow merging lots of different surfaces.

The https://github.com/VReaperV/Daemon/tree/material-clusters branch was an attempt at merging surfaces by using texture arrays and binding textures per material (with a texture layer and scale used in the shader for each relevant one), which worked alright on my end (sans some bugs at surface edges), but didn't really seem to give a performance benefit. On Mesa/AMD it was slower than current material system at tested by @illwieckz, however maybe using per-stage material data would help with this. It does, however, seem that I overcomplicated that branch (it even copies vertexes, not just the indexes, and for each view), and the surface merging can probably be better achieved in another pass: the cull and surface processing shaders are already very fast, especially if the subgroup extension is supported.

VReaperV avatar Nov 19 '24 18:11 VReaperV

I guess this is probably obsolete but in case there's still anything to be tested, I now have a test suite...

slipher avatar Jul 07 '25 02:07 slipher