Daemon icon indicating copy to clipboard operation
Daemon copied to clipboard

qcommon: let the framerate flirt with the limits

Open illwieckz opened this issue 1 year ago • 32 comments

Let the framerate flirt with the limits.

Basically the idea is to not sleep the remaining time to fill the expected frametime, but to let enough time after sleeping to process events in a loop until the expected frametime is reached.

It makes the framerate less jaggy, and actually provides to the user what they expect.

illwieckz avatar May 15 '24 19:05 illwieckz

Some before/after screenshots…

1. common.framerate.max 100, 100 fps is expected.

Before, notice the 99 fps (it never reached 100 fps) and the jaggy framerate line:

unvanquished_2024-05-15_203847_000

After, notice the 100 fps and the smooth framerate line:

unvanquished_2024-05-15_204113_000

2. common.framerate.max 0, 333 fps is expected.

Before, notice the 325 fps (it never reached 333 fps) and the jaggy famerate line:

unvanquished_2024-05-15_203913_000

After, notice the 333 fps and the smooth framerate line:

unvanquished_2024-05-15_204153_000

2. common.framerate.max -1, 1000 fps is expected.

Before, notice the 932 fps (it never reached 1000 fps) and the jaggy framerate line:

unvanquished_2024-05-15_204002_000

After, notice the 1000 fps and the smooth framerate line:

unvanquished_2024-05-15_204308_000

illwieckz avatar May 15 '24 19:05 illwieckz

This was tested on high performance CPU and GPU profile with lowes preset and low resolution (expecting to get very high framerate), with framerate flirting with limits like 1000 fps.

This was also tested on low performance CPU and GPU profile with ultra preset and 4K resolution (expecting to get very low framerate and the game struggling to render), with framerae flirting with limits like 50 fps or 30 fps.

In all case the game gave me the framerate I asked.

In the rare cas I got more, I got a bit more, which is expected since when frametime = 1000 / framerate, then frametime * 1000 < framerate (precision loss in integer division), and which is wanted since if you ask the game a framerate you want it to match with your screen, you don't want to get your screen renders two times the same frame. You better want to waste a rendered frame than to render twice the same one.

illwieckz avatar May 15 '24 19:05 illwieckz

Basically the idea is to not sleep the remaining time to fill the expected frametime, but to let enough time after sleeping to process events in a loop until the expected frametime is reached.

Are you saying you are going to put it in a busy loop?

In the rare cas I got more, I got a bit more, which is expected since when frametime = 1000 / framerate, then frametime * 1000 < framerate (precision loss in integer division), and which is wanted since if you ask the game a framerate you want it to match with your screen, you don't want to get your screen renders two times the same frame. You better want to waste a rendered frame than to render twice the same one.

Isn't that what vsync is for?

slipher avatar May 16 '24 01:05 slipher

Are you saying you are going to put it in a busy loop?

In some way, yes. We have a problem here, this code was designed for a time when people were lucky to reach 30fps.

A sleep function based on milliseconds here is not precise enough.

With our 333 high cap, it means 3.003 ms of sleep between each frame if any compute was not consuming any time.

There are 240MHz screens out there, it means 4.166ms between each frames if any compute was not consuming any time.

Isn't that what vsync is for?

Yes? I'm not saying this should be used instead of vsync, I'm just using this example to show how random the current implementation is and can't give what the user asked for. It's always randomly sleeping and produces a unpredictable framerate.

I'm even wondering if the original code was making players with higher framerate advantaged against those with lower framerate. As with higher framerate the game processes the input more frequently, as with lower framerate the game just sleeps while the player inputs. With two people clicking +fire at the same time, the high fps player may probably win? One should have to dive deeper in the code to check that, but the surface really, really stinks.

illwieckz avatar May 16 '24 08:05 illwieckz

Well, about the busy loop, the “busy loop” calls Com_EventLoop() and IN_Frame() on every step, which is probably not really a “busy loop”.

illwieckz avatar May 16 '24 08:05 illwieckz

Are you saying you are going to put it in a busy loop?

In some way, yes.

That's bad. For example if you are using a laptop, it can drain the battery quickly. Please don't make a busy loop!

I'm even wondering if the original code was making players with higher framerate advantaged against those with lower framerate. As with higher framerate the game processes the input more frequently, as with lower framerate the game just sleeps while the player inputs. With two people clicking +fire at the same time, the high fps player may probably win? One should have to dive deeper in the code to check that, but the surface really, really stinks.

Yes, the player with a higher framerate would be expected to be able to respond faster in general. Stuff will show up faster on the screen, packets will be sent out sooner, etc. Mostly we can't do anything about that.

There is one thing that checking the input more frequently could improve, which is the responsiveness of movement. The forwardmove/rightmove/upmove part of usercmd_t is supposedly pro-rated to the proportion of the frame that the key was held, with the assumption that it was pressed down at the same wall clock time that the input loop detected it. Still, there is no sense in checking more frequently than once per millisecond! I believe other controls won't be affected by more frequent input checking, as they are just booleans (or commands), and the usercmd_t packet will only be sent once per frame.

Using a more precise sleep function is not a bad idea, as long as it is implemented by actually sleeping rather than spinning the CPU.

slipher avatar May 16 '24 17:05 slipher

That's bad. For example if you are using a laptop, it can drain the battery quickly. Please don't make a busy loop!

To be fair, this is a game which is going to have excessive power constraints anyways.

Using a more precise sleep function is not a bad idea, as long as it is implemented by actually sleeping rather than spinning the CPU.

Using a more precise sleep function is not a bad idea, as long as it is implemented by actually sleeping rather than spinning the CPU.

https://stackoverflow.com/questions/13397571/precise-thread-sleep-needed-max-1ms-error

Seems like this is actually quite difficult.

It would be interesting to note how long each "busy loop" takes on the average case.

DolceTriade avatar May 16 '24 19:05 DolceTriade

Also I'm not sure if this is related to this change, but when testing this change, if I change the limit from 125 to 144, the game actually limits me to 167.

DolceTriade avatar May 16 '24 19:05 DolceTriade

Also in my tests, we never actually sleep...

DolceTriade avatar May 16 '24 19:05 DolceTriade

https://stackoverflow.com/questions/13397571/precise-thread-sleep-needed-max-1ms-error

Very interesting thread! And a bit scary:

According to this article, scheduler latency could be anywhere between 10-30ms on Linux.

And there are already specific code in our code base for Windows because of the default sleep being bad.

For 60fps, we need a sleep function that can wait 16ms with 0.6ms error. For 144 fps, we need a sleep function that can wait 6ms with .9ms error, and for 240fps we need a sleep that can wait for 4ms with 0.1ms error. Ouch.

It probably doesn't make sense to start sleeping when requesting framerate above 60fps, and someone playing the game on laptop on battery doesn't want to have more than 60 fps. If the player requests more than 60fps, it will not be the useless CPU spinning that will draw the batter, but the GPU crunching insane amount of numbers.

illwieckz avatar May 16 '24 19:05 illwieckz

Also I'm not sure if this is related to this change, but when testing this change, if I change the limit from 125 to 144, the game actually limits me to 167.

The various name discrepancies you pointed out (like the sleep variable not used for sleeping) comes from the fact I tried many attempts of achiving this… And in 90% of my attempts (not with this one on my end), I was experiencing curious fps jumps.

Like, I could have 125 fps or 200 fps but nothing in between whatever the value I would ask… And I've set all the options I know to disable screen sync, and I got it with the same environment I use while not getting it with this branch.

illwieckz avatar May 16 '24 20:05 illwieckz

Also, testing this branch with a CPU governor being ondemand is like not testing this branch, the randomness of the CPU governor will be as wrong as the sleep randomness this branch attemps to avoid.

illwieckz avatar May 16 '24 20:05 illwieckz

Well, not as wrong… on master branch with ondemand CPU governor I get a close-to-flat line around 60fps, with 120 fps it stats being bad, after that it's just garbage.

With this branch with ondemand it's still smooth at 300fps, and really smooth. But I when approching the 1000fps cap, I can't get more than 900fps, the CPU governor is just breaking everything. It's not because the hardware can't do it, if I remove the 1000 fps cap it goes well higher.

illwieckz avatar May 16 '24 20:05 illwieckz

It probably depends on the environment (other software running around), I remember that yesterday I could witch from powersave to ondemand and just get flat line in both case, with an immediate switch to the different levels.

illwieckz avatar May 16 '24 20:05 illwieckz

Can you verify if we sleep at all? In my tests we don't, regardless of the FPS limit.

DolceTriade avatar May 16 '24 20:05 DolceTriade

I pushed a simpler implementation. It is purposed to only sleep if there is more than 2ms to sleep (under 500fps) and never sleep the last 1ms (if the sleep function is precise enough to wake up with less than 1ms error…).

illwieckz avatar May 16 '24 20:05 illwieckz

On my end it is still smooth at 333fps with ondemand governor and a crowded desktop.

illwieckz avatar May 16 '24 20:05 illwieckz

It is also smooth at 60fps on powersave CPU profile with low GPU profile.

illwieckz avatar May 16 '24 20:05 illwieckz

Ok, the weirdness around maxFPS = 144 is around integer division.

we do:

minMsec = 1000 / maxFPS

1000 / 144 = 6.9 but we truncate it to 6. 1000 / 6 =~ 167.

The new code is way simpler and easier to follow. Let me verify it actually sleeps.

DolceTriade avatar May 16 '24 21:05 DolceTriade

We actually sleep. Nice.

DolceTriade avatar May 16 '24 21:05 DolceTriade

1000 / 144 = 6.9 but we truncate it to 6. 1000 / 6 =~ 167.

Yes, the same way starting with 251 you jump from 250 to 333 directly because 1000/251 is 3.9.

We actually sleep. Nice.

Good! Then it works.

illwieckz avatar May 16 '24 21:05 illwieckz

Yes and the laptop user on battery with 60fps would have a budget of 16ms, sleeping for 14ms and being busy for 2ms, actually sleeping 840ms per second (if the rendering is instantaneous, of course)

illwieckz avatar May 16 '24 21:05 illwieckz

Last thing to mention is that, this PR effectively doubles CPU usage when locked at 125fps for me, but does not affect CPU when maxed out.

On master, at 125fps, game uses ~15% of a core. With this PR, we use ~30% of a core. On master at 333fps, game uses ~60% of a core. With this PR, we use ~60% of a core.

DolceTriade avatar May 16 '24 21:05 DolceTriade

I think if we are worried about power saving, lots of new games have a "power saving mode" where we can modify the max 50ms sleep and make it larger.

50ms sleep means 20fps (if the rest of the compute is instantaneous), I doubt we need to increase this limit.

illwieckz avatar May 16 '24 21:05 illwieckz

On master, at 125fps, game uses ~15% of a core. With this PR, we use ~30% of a core. On master at 333fps, game uses ~60% of a core. With this PR, we use ~60% of a core.

Interesting, what do you get at 60fps?

illwieckz avatar May 16 '24 21:05 illwieckz

What I meant was instead of waking up every 50ms, we can sleep more at the cost of more variable FPS.

DolceTriade avatar May 16 '24 21:05 DolceTriade

Ah yes, purposed variable rate… Not updating the screen if nothing changes, for example.

illwieckz avatar May 16 '24 21:05 illwieckz

Interesting, what do you get at 60fps?

On master at 60fps, game uses ~6% of a core. With this PR at 60fps, game uses ~15% of a core.

DolceTriade avatar May 16 '24 21:05 DolceTriade

The trade off here being CPU cycles for increased responsiveness and smoother FPS?

DolceTriade avatar May 16 '24 21:05 DolceTriade

Can you look at the CPU usage with 60fps and 120fps with this line?

int sleep = std::max( std::min( minMsec - msec, 50 ) - 1, 0 );

illwieckz avatar May 16 '24 21:05 illwieckz