Helion icon indicating copy to clipboard operation
Helion copied to clipboard

High power usage compared to Woof! and DSDA-DOOM, especially in low-intesity scenarios

Open JustinWayland opened this issue 11 months ago • 15 comments

Done in opening of E1M1 of Doom. Power usage was collected using turbostat on Linux using a Radeon 780M. All source ports were set to 1080p60fps for this test.

Source Port Power Usage (W)
Idle (None Loaded) 3-7
Woof! 9.5
DSDA-DOOM (OpenGL) 7-9
Helion 18.62

I did another test for MAP28 of Struggle: Antaresian Legacy.

Source Port Power Usage (W) Notes
Idle 3-7
Woof! 14.6 Did not render at 60fps
DSDA-DOOM (OpenGL) 13-14.5
Helion 19-20

As can be seen, Woof! and DSDA-DOOM make somewhat significant gains in power usage in high-stress scenarios compared to Helion. When in a low stress scenario, however, they utterly crush Helion.

I'm not sure how much of this is Helion being written in C#, how much of it is overhead in OpenTK compared to Helion, how much of this is due to better utilization of system resources by Helion, and how much is stuff Helion can improve in its codebase.

JustinWayland avatar Feb 25 '25 03:02 JustinWayland

As far as I know, the normal way OpenTK (and probably GLFW underneath) waits for the next frame is a spin wait, so Helion will always use a full CPU core for whichever thread is running the render loop.

lemming104 avatar Feb 25 '25 03:02 lemming104

That would probably do it. If only there were an easy way to plumb SDL rendering into Helion without changing a large part of the codebase, since judging from the other source ports, which do use it, SDL does not use a spin lock.

We could also implement our own frame-waiting algorithm, but that sounds like a disaster.

JustinWayland avatar Feb 25 '25 03:02 JustinWayland

One question: is there any reason we are not calling this when we enable VSync in the config other than that switching VSync back off might require a restart?

JustinWayland avatar Feb 25 '25 03:02 JustinWayland

Whatever OpenTK is doing, it seems to do differently depending on whether we're windowed or full-screen, at least on Windows.

CPU usage on my Win11 machine is noticeably lower when running windowed or in a full screen borderless window (available in the dev branch) compared to full screen exclusive.

lemming104 avatar Feb 25 '25 04:02 lemming104

I'll keep this open for now just in case we find some stupid optimization that ends up saving enough power that any remaining difference can be chalked up to C#. A big discovery would be a way to bypass that spin lock in order to give up more CPU time, which would help reduce power consumption.

Some options, from most likely to work to least likely to work:

  1. Use SDL to render Helion instead of OpenTK, as SDL does not seem to use a spin lock when waiting to present a frame. This would have the obvious disadvantage of forcing us to change large chunks of the code, which could obscure any impact on power consumption, but the obvious advantage of potentially being the least janky.
  2. Call OpenTK.Windowing.GraphicsLibraryFramework.SwapInterval with a value of 1 when VSync is turned on. According to OpenTK and GLFW's documentation, this would cause the GPU driver to block when we call OpenTK.Windowing.GraphicsLibraryFramework.SwapBuffers, which we already do somewhere in Client/Client.cs. This has the advantage of disturbing very little code. The only potential issues are that some GPU drivers will not respect a call to SwapInterval with a value of zero, which would mean that turning off VSync would require a restart under those GPUs, and that there could be problems if the monitor's refresh rate is not a multiple of Helion's refresh rate.
  3. Find a way to call glFinish() after calling OpenTK.Windowing.GraphicsLibraryFramework.SwapBuffers. This has the big disadvantage of not allowing the CPU and GPU to work in parallel, which could have performance implications, but it could also avoid some jank inherent to the previous item.

This is all I can think of after some research. I will leave it up to the Helion maintainers to decide which to implement, or whether to implement any of them at all.

JustinWayland avatar Feb 25 '25 04:02 JustinWayland

I think OpenTK already sets the swap interval, unless I'm misreading. Source from NativeWindow below:

        /// <summary>
        /// Gets or sets the VSync state of this <see cref="NativeWindow"/>.
        /// </summary>
        /// <value>
        /// The VSync state.
        /// </value>
        public VSyncMode VSync
        {
            get
            {
                if (Context == null)
                {
                    throw new InvalidOperationException("Cannot control vsync when running with ContextAPI.NoAPI.");
                }

                return _vSync;
            }

            set
            {
                if (Context == null)
                {
                    throw new InvalidOperationException("Cannot control vsync when running with ContextAPI.NoAPI.");
                }

                // We don't do anything here for adaptive because that's handled in GameWindow.
                switch (value)
                {
                    case VSyncMode.On:
                        Context.SwapInterval = 1;
                        break;

                    case VSyncMode.Off:
                        Context.SwapInterval = 0;
                        break;
                }
                _vSync = value;
            }
        }

Explicitly calling OpenTK.Windowing.GraphicsLibraryFramework.GLFW.SwapInterval(1); after setting VSync also appears to have no impact on CPU utilization in exclusive full-screen mode.

lemming104 avatar Feb 25 '25 04:02 lemming104

So I guess the only option left is to do the messy swapout of OpenTK to SDL and see if that improves anything. Otherwise, I'm out of ideas.

JustinWayland avatar Feb 25 '25 04:02 JustinWayland

It doesn't have anything to do with OpenTK vs SDL. I know for sure dsda uses sleep to try to do the frame limiting which is why it uses less CPU: https://github.com/kraflab/dsda-doom/blob/4f992b7b550306769bc2196bcb42ca84c3236c2c/prboom2/src/dsda/time.c#L104 https://github.com/kraflab/dsda-doom/blob/4f992b7b550306769bc2196bcb42ca84c3236c2c/prboom2/src/dsda/time.c#L100C7-L100C15

Limiting the FPS with the built in OpenTK method doesn't do this because sleeping is not a good solution, and they decided having a good method is outside of their scope. FPS limiting/timing is pretty complicated. For this reason I don't use the built in frame limiter in any game, I use RTSS. RTSS will drop the power consumption. The other thing to keep in mind is the power consumption still may be higher on large maps since Helion needs to process the BSP in a separate thread to support the automap. This can be disabled with render.automapbspthread 0 in the console.

nstlaurent avatar Feb 25 '25 10:02 nstlaurent

I'm on Linux, sadly, so RTSS is out of the question for me. However, I noticed something interesting.

I just set my display's refresh rate from 165 Hz to 60 Hz and tested again. Now the tables have turned completely: Helion in the low-intensity scenario drew about as much power as my browser, and in the high-intensity scenario drew about 7-10W. It is now very hard to distinguish from idle.

Dunno what that says, but it seems to be a good workaround... for now.

EDIT: Decided to check other source ports. Woof! also uses sleep in its inner loop, though it seems to sleep for 500us at a time and doesn't bother at all if there is less than 1000us until the next frame. I checked GZDoom's source, and it also uses sleep. Both, however, make sure to stop sleeping close to the deadline.

https://github.com/fabiangreffrath/woof/blob/40d413183cebe7bbdcb4974c8b4fdd0c112d95fd/src/i_video.c#L898 https://github.com/ZDoom/gzdoom/blob/e5cf79fecb325d3620fdec7a3422a453329559d9/src/common/rendering/v_framebuffer.cpp#L275

EDIT 2: We could call SDL_Delay instead of the unreliable Thread.Sleep(), which a lot of the above source ports already do and would be guaranteed to be cross-platform. The main choice seems to be how much slack room to allow. Personally I would advocate for the unpopular 4ms, in case some poor Linux user is running Helion on Debian with the old default of 250 Hz.

JustinWayland avatar Feb 25 '25 18:02 JustinWayland

You can play around with it by importing directly from sdl2: [DllImport("SDL2.dll", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true)] public static extern int SDL_Delay(long ms);

Total target sleep time: var target = 1000.0 / m_config.Render.MaxFPS.Value;

I was unable to get reliable results still. For windows underneath it should just be calling TimeBeginPeriod(1)/TimeEndPeriod(1) and then Sleep.

Also if we never sent the invite to the discord group: https://discord.gg/zJ83Npsr

nstlaurent avatar Feb 25 '25 21:02 nstlaurent

Just happened to see this issue. In OpenTK 4.8.0 there is Utils.AccurateSleep that tries to OS sleep but will resort to spin waits if it's not confident the sleep will wake up in time. https://github.com/opentk/opentk/blob/4.8.0/src/OpenTK.Core/Utility/Utils.cs#L29-L48

This version of OpenTK also calls beginTimePeriod(8) on windows to get more accurate OS sleeps to be able to rely on them more.

NogginBops avatar Jun 13 '25 09:06 NogginBops

It needs to be changed for Linux, but that just requires reading a file in /proc/sys/. I’ll figure out which one once I get home, I have an external commitment right now.

On Fri, Jun 13, 2025 at 5:35 AM Julius Häger @.***> wrote:

NogginBops left a comment (Helion-Engine/Helion#1027) https://github.com/Helion-Engine/Helion/issues/1027#issuecomment-2969735986

Just happened to see this issue. In OpenTK 4.8.0 there is Utils.AccurateSleep that tries to OS sleep but will resort to spin waits if it's not confident the sleep will wake up in time.

https://github.com/opentk/opentk/blob/4.8.0/src/OpenTK.Core/Utility/Utils.cs#L29-L48

This version of OpenTK also calls beginTimePeriod(8) on windows to get more accurate OS sleeps to be able to rely on them more.

— Reply to this email directly, view it on GitHub https://github.com/Helion-Engine/Helion/issues/1027#issuecomment-2969735986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4H5HQ456G7JOEABLCJHDL3DKLNDAVCNFSM6AAAAABXZRZ7NCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNRZG4ZTKOJYGY . You are receiving this because you authored the thread.Message ID: @.***>

JustinWayland avatar Jun 13 '25 16:06 JustinWayland

What needs to be changed for linux?

NogginBops avatar Jun 15 '25 16:06 NogginBops

Dunno what I was thinking, the file I was thinking of was moved to a new location that is only accessible by root.

On Sun, Jun 15, 2025 at 12:56 PM Julius Häger @.***> wrote:

NogginBops left a comment (Helion-Engine/Helion#1027) https://github.com/Helion-Engine/Helion/issues/1027#issuecomment-2974258706

What needs to be changed for linux?

— Reply to this email directly, view it on GitHub https://github.com/Helion-Engine/Helion/issues/1027#issuecomment-2974258706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4H5HSMGR6N6APPHFUOXA33DWQTJAVCNFSM6AAAAABXZRZ7NCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNZUGI2TQNZQGY . You are receiving this because you authored the thread.Message ID: @.***>

JustinWayland avatar Jun 15 '25 17:06 JustinWayland

Just updated OpenTK to 4.9.4 on dev so now we have access to Utils.AccurateSleep.

nstlaurent avatar Jun 15 '25 17:06 nstlaurent