Daemon icon indicating copy to clipboard operation
Daemon copied to clipboard

Support graphics cards without `ARB_half_float_vertex`

Open illwieckz opened this issue 1 year ago • 15 comments

This is to address those two issues:

  1. Some graphics devices don't supports ARB_half_float_vertex like the Intel GMA Gen3 architecture with the Mesa driver.
  2. Some graphics devices don't supports ARB_half_float_vertex but drivers may provide an unplayable slow emulation, like the ATi r300 architecture with the Mesa driver.
    • https://github.com/DaemonEngine/Daemon/issues/1172

Those changes purposes to provide an alternative code path when ARB_half_float_vertex is not available or when the r_arb_half_float_vertex cvar is disabled on purpose to avoid a slow emulation provided by the driver.

It is the next step over preparatory work that was already merged:

  • Some improvements of our type system, helping the compiler to report more type errors and then making the conversion easier with the less risk of introducing silent errors: https://github.com/DaemonEngine/Daemon/pull/728
  • Some enablement of GL 2.1 on the said Intel graphics chip so we can know GL 2.1 is supported but catch the lack of the half float vertex extension, meaning that if we implement a workaround one day (that is the day today), we would already have everything else already in place for it: https://github.com/DaemonEngine/Daemon/pull/478

This branch doesn't break compatibility with the game but I implemented it over the illwieckz/image-fistcreen/sync branch anyway:

  • https://github.com/DaemonEngine/Daemon/pull/1145

I did it that way because this other branch also helps running the game on this kind of hardware (to not upload textures too big for such hardware) and we better test this hardware over for-0.55.0/sync because of all compatibility-breaking changes purposed for performance that were merged there. The illwieckz/image-fistcreen/sync branch itself is already implemented over for-0.55.0/sync because of compatibility breaking.

This branch also includes the GL_DEPTH_CLAMP disablement when GL_ARB_depth_clamp to avoid spamming the console with warnings when running the game on ATi r300 graphics card, to make debugging easier:

  • https://github.com/DaemonEngine/Daemon/pull/1173

~~⚠️🚧️ This branch is in very early state! 🚧️⚠️~~ ~~⚠️🚧️ It's even not ready to review! 🚧️⚠️~~ ~~⚠️🚧️ This may be full of useless halffloat-to-float-to-halffloat conversions! 🚧️⚠️~~ ~~⚠️🚧️ I even haven't tested if it doesn't break the usual code path! 🚧️⚠️~~

~~I push this branch now because it starts to work and this is so precious I don't want the team to lose that work is something happens to me!~~

It's now ready to be reviewed, it works for me with both code paths, and is rebased on master.

illwieckz avatar Jun 04 '24 03:06 illwieckz

For unknown reasons, this is broken:

  • mouse pointer
  • minimap
  • console text

illwieckz avatar Jun 11 '24 01:06 illwieckz

For unknown reasons, this is broken:

  • mouse pointer
  • minimap
  • console text

It was just some stupid copy-paste mistake.

Now it looks like everything is working.

illwieckz avatar Jun 11 '24 06:06 illwieckz

Now that I verified the branch works, the branch is now targeting master.

It is ready to review.

illwieckz avatar Jun 11 '24 07:06 illwieckz

The changes are a bit verbose because I renamed all related half float variables by prefixing them with f16 (for example, half-float texCoords becomes f16TexCoords, while float texCoords is named texCoords). Not only it helps to avoid confusion when reading, but it makes far easier to me to implement this code, because I just had to rename texCoords to f16TexCoords to have my compiler reporting me all the lines requiring an alternate code.

illwieckz avatar Jun 11 '24 18:06 illwieckz

As an experiment, I implemented a proof of concept of automatic vertex translation at VBO upload time in the branch slipher/vertex-translation. I'm not sure if that approach is better, but I was curious if it would work.

It doesn't translate every VBO because so far it only acts on "static" VBOs (ones fixed at load time). But I imagine it would be enough to get good performance on the cards with poor implementation quality of the half-float extension.

If we were to productionize this, the vertex translation process should probably be combined into R_CopyVertexData which already appears to do some similar work.

Usage of half floats in the VBOs can be toggle with r_avoidHalfFloatVertex.

slipher avatar Jun 11 '24 23:06 slipher

The slipher/vertex-translation branch is enough to recover performance on ATI r300:

unvanquished_2024-06-12_022513_000

As a remind, I would like to not only accelerate r300 (that has a slow half-float vertex driver emulation) but also enable some Intel cards (that also support GL 2.1 but don't provide half-float vertex and provide no emulation).

A good way to emulate with Mesa a card only having GL 2.1 and not having half float vertexes is to set some environment variable this way:

MESA_GL_VERSION_OVERRIDE='2.1' MESA_EXTENSION_OVERRIDE='-GL_ARB_half_float_vertex' ./daemon +devmap vega

This should work with any Mesa driver, even llvmpipe on Windows:

  • https://github.com/pal1000/mesa-dist-win

More details about Mesa environment variables:

  • https://docs.mesa3d.org/envvars.html

illwieckz avatar Jun 12 '24 00:06 illwieckz

I don't mind that much the implementation, as long as it works.

What I like in my branch is that I renamed the half float variables with an explicit prefix, and I find it better for the code to read and debug, this may be something I would redo even if we merge another implementation.

We may even decide to convert in the future some current half float code to float, to avoid useless round trips between float and half float. I especially think about the model code: some parts of the model CPU fallback code may process half floats, but I want this code to be as fast as possible since it's already a fallback for the GPU not being powerful enough…

illwieckz avatar Jun 12 '24 00:06 illwieckz

What is currently broken in slipher/vertex-translation if I make the extension not required and it is missing, is some 2D things like text and images that are either garbage or distorted in wrong ways. I'm fine with the idea of translating the data when needed, especially if it prevents us to duplicate code everywhere, and if it even allows us to add new stuff using half float and relying on a generic converter when needed.

illwieckz avatar Jun 12 '24 01:06 illwieckz

@slipher do you have some update on your slipher/vertex-translation branch? I like your approach, but it was not complete as far as I know.

I will have access to the machine with the Intel card without half-float vertex for two weeks starting with the next week.

If you're not available to work on this for now, I may merge my implementation and we can revert it in the future to merge your implementation instead. I would prefer to have yours, but we can take time to improve it.

illwieckz avatar Jul 30 '24 18:07 illwieckz

@slipher do you have some update on your slipher/vertex-translation branch? I like your approach, but it was not complete as far as I know.

I made some progress in implementing a more complete version of the idea, which eliminates some of the unnecessary layers in preparing vertex data (it's often copied about 5 times before reaching the VBO). IIRC I finished 2 of the 3 model formats. Got sidetracked by all the bugs I was finding in the renderer. I'll try to dig that out that code in the next few days.

slipher avatar Jul 31 '24 18:07 slipher

Got sidetracked by all the bugs I was finding in the renderer.

I know what it is. 🤣️

Thanks for the update! 🙂️

illwieckz avatar Jul 31 '24 19:07 illwieckz

This branch makes the Intel GMA 3 working:

unvanquished_2024-08-03_195448_000

I needed some other minor patches to not compile some shaders for disabled features like bloom, but nothing more related to vertex float was needed.

The output of glxinfo -B:

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Mesa Project (0x8086)
    Device: i915 (chipset: G33) (0x29c2)
    Version: 24.0.9
    Accelerated: yes
    Video memory: 384MB
    Unified memory: yes
    Preferred profile: compat (0x2)
    Max core profile version: 0.0
    Max compat profile version: 2.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 2.0
OpenGL vendor string: Mesa Project
OpenGL renderer string: i915 (chipset: G33)
OpenGL version string: 2.1 Mesa 24.0.9-0ubuntu0.1
OpenGL shading language version string: 1.20

OpenGL ES profile version string: OpenGL ES 2.0 Mesa 24.0.9-0ubuntu0.1
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 1.0.16

illwieckz avatar Aug 03 '24 19:08 illwieckz

You can see my current progress on the branch slipher/runtime-vbo-layout. The dynamically generated vertex attribute layout, which may be configured to use half float or not, is used for IQM, MD3, and MD5 models. It's not ready to test as other VBOs are not ported yet. Remaining steps:

  • Investigate the vertex attribute-related warning spam on station15. IIRC this is caused by some code which creates some verts with the qtangent attribute, then later overwrites that data with the incompatible orientation attribute. I should be careful to avoid breaking this.
  • Use the new vertex attribute layout method for the other static VBOs.
  • Change the dynamic VBO's vertex struct to (always) use float instead of f16_t. This is probably better anyway. Data in the dynamic VBO is only used once and (I believe; will double check) is usually small. Trying to compress it seems pointless. Also tess.verts is sometimes used for data which is not actually sent to the GPU, so avoiding useless conversions should be a win there.

slipher avatar Aug 09 '24 06:08 slipher

Great! I like the way it looks!

As a side note, I prefer the cvar being named r_arb_half_float_vertex (not r_ext_half_float_vertex) as the extension is ARB_half_float_vertex, and ordered alphabetically like the others.

illwieckz avatar Aug 09 '24 18:08 illwieckz

I spelled out my thoughts on the extension cvar naming. To me it looks like a case of cargo culting gone out of control. Maybe some of the first extensions were EXT ones, which makes some sense as part of the cvar name. But ARB really doesn't.

slipher avatar Aug 10 '24 05:08 slipher

I rebased on latest master, which reduced the diff size as some code was just NUKED since.

I added a detection of early RV300 hardware were we should prefer non-half float, and added a workaround cvar that can be disabled to test the r300 driver half-float performance when not using such workaround.

This looks ready to me.

Regarding to the slipher/runtime-vbo-layout branch, I do believe it is the future, but I prefer to see that branch rebased on this one. This one also brings some optimizations and some renaming that makes easier to follow the code and I would like to merge that in all cases anyway.

We can already merge that as it already works, without waiting for slipher/runtime-vbo-layout. While I do believe such deeper redesign is the future, it can wait a bit.

Basically a rebase of slipher/runtime-vbo-layout would have to add an extra parent commit to unsplit the float/half-float if/else code paths. I can help at rebasing it.

illwieckz avatar Oct 01 '24 07:10 illwieckz

I just looked at my vertex layout branch again and it's actually 95% done. The big thing it was waiting on was the autosprite rework (so that we no longer have overlapping attributes). I can have the PR ready in the next day or two.

slipher avatar Oct 07 '24 00:10 slipher

Great!

illwieckz avatar Oct 07 '24 11:10 illwieckz

Obsoleted by #1344:

  • https://github.com/DaemonEngine/Daemon/pull/1344

illwieckz avatar Oct 10 '24 23:10 illwieckz

We still need the commit to add a workaround flipping off half float vertex for the driver with poor quality of implementation.

slipher avatar Oct 11 '24 00:10 slipher

Yes, I'm on it. :-)

illwieckz avatar Oct 11 '24 00:10 illwieckz

See:

  • https://github.com/DaemonEngine/Daemon/pull/1347

illwieckz avatar Oct 11 '24 00:10 illwieckz