[Feature Request] Low latency output on Windows via ASIO/WASAPI exclusive
High audio latency (100ms+) is something that has plagued apps/games on Windows for a long time, yet people rarely notice, let alone measure it, like Matt Gore, HeSuVi developer and Battle(non)sense.

So I've been wondering if ASIO could be implemented into OpenAL Soft directly, since Crystal Mixer already did something like that, but AFAIK it's only capable of virtualizing the multichannel audio mix.
Alternatively, someone even modified OpenAL Soft to use WASAPI in exclusive mode, which I've tested and confirmed it does make a difference (tho I'm yet to measure it), so I forked it here
Perhaps a flag to switch to exclusive mode in the main branch would be more feasible since it just seems to require a couple line edits (tho it would probably need some improvement so it's not restricted to sample-type=int16 and period_size is automatically set to the lowest supported by the sound card, as well as minimal/no mixahead or any other bottlenecks to ensure lowest possible latency).
Either option should hopefully allow ultra low latency on thousands of games that are at least potentially supported by OpenAL Soft by using the sound card's native ASIO or ASIO4ALL. I think audio would also be bit-perfect audio (at least on WASAPI exclusive) by bypassing the Windows mixer. So perhaps eventually including both would give people options based on their needs.
I'm curious how much of the latency is a result of using shared mode or non-ASIO output. "Button to audio" latency with random games doesn't say much, since it's also including input latency (the time from physically pressing a button to the OS detecting the input, then to the process detecting the input), and logic/frame latency (the time from the process getting input to processing a new logic frame, and from a logic frame to updating audio state, which can be at different rates), and only then getting to the audio latency.
OpenAL Soft itself will add about 50ms on average, given the default 20ms period size and 60ms buffer. Certain post processors may add a couple more milliseconds (output limiter, UHJ encoder, etc, which will be reported as "Fixed device latency: ..." in the trace log).
According to this page, starting with Windows 10 the default audio engine latency is 1.3ms, plus a 10ms default period size which will get written to the buffer for the hardware. So adding that all up, there should be about 51.3ms to 71.3ms if there's no other hidden latency anywhere. By changing OpenAL Soft's period size and period count properties, It could be reduced to a period size of 10ms and a 20ms buffer, which would make OpenAL Soft average 15ms, making the latency from OpenAL to output about 21.3ms to 31.3ms. Although this will have a higher risk of underruns.
Before Windows 10, there's an additional 11ms for floating point sample streams and 5ms for integer sample streams. APOs may add additional latency, but there's no information about if there's any used normally.
Alternatively, someone even modified OpenAL Soft to use WASAPI in exclusive mode, which I've tested and confirmed it does make a difference (tho I'm yet to measure it), so I forked it here https://github.com/ThreeDeeJay/openal-soft-WASAPI-exclusive/commit/9cd722fc9a80181cc9c86db9a0ec86728dafb7a3
Well, one apparent difference is it passes a bad period size to IAudioClient::Initialize (it passes the same size for the buffer and period size, using the buffer size, when the buffer size should be at least twice the period size), sets incorrect values for the OpenAL device's buffer and period size (sets the period size using the buffer size), and doesn't properly pace updates (whenever the mixer thread wakes up, it processes however many samples WASAPI says are available regardless if it's at the period size yet). It also seems to get the minimum period size before initialization and the buffer size after initialization, but does nothing with them. It's impossible to tell what the device is going to do with regards to buffering/latency.
I have always appreciated ASIO... if not any because I had a Xonar sound card, and even for my poor realtek I had found a *native* driver anyway (also, I think they had made some multiclient driver?). But is there really much of a point in 2022 over a "normal" api like IAudioClient3 in exclusive mode? https://github.com/mumble-voip/mumble/issues/1604
I mean, putting aside that I don't think games are meaningfully hampered by this. Academically speaking, is it worth at least 1ms? Or is it just a relic of another epoch when the windows mixer was called KMixer?
p.s. as far as WDM-KS workarounds go.. I believe FlexASIO was the current champ
Inb4 this is as good as exclusive https://github.com/miniant-git/REAL
Inb4 this is as good as exclusive https://github.com/miniant-git/REAL
Not sure that would help too much. That simply forces the audio server/service to use a shorter update period, but the app's buffer size is left unchanged. Unless the app calculates a buffer size based on the device's period size, that would only cause more frequent updates for the same buffer size.
And actually for such cases, that would cause slightly higher overall latency since the buffer won't drain as much before doing another update. If the buffer is 40ms total, for example, the default 10ms period size would mean the buffer would have 30ms filled by the time an update occurs, meaning latency as low as 30ms for anything triggered just before the update; whereas if the period size is forced to 2ms (or whatever it sets), the same buffer will have 38ms filled when an update occurs, meaning latency closer to 38ms for anything triggered just before the update. So with the default period size, latency can vary between 30-40ms, whereas with a "low latency" 2ms period size, latency can vary between between 38-40ms, a notably higher average and minimum bound.
In the case of OpenAL Soft, it uses a multiple of the period size to stay close to its internal 20ms update size (or whatever period_size is set to), with a total buffer that's 3 times the size (or whatever periods is set to). So latency and update granularity should remain somewhat consistent regardless of what that does. It will just be woken up more often to check if there's enough writable space to do a full update, wasting CPU time. That would allow you to set a smaller period size since it won't be limited to a multiple of the 10ms default, instead a multiple of whatever that sets, but it won't do anything on its own.
Never had luck getting less than 10ms with REAL.
I should point out that I didn't revert back to the Microsoft drivers (which is optional anyway) cuz I wouldn't wanna lose 7.1/5.1 in both my internal/USB sound cards.
Well, one apparent difference is it passes a bad period size to
IAudioClient::Initialize(it passes the same size for the buffer and period size, using the buffer size, when the buffer size should be at least twice the period size), sets incorrect values for the OpenAL device's buffer and period size (sets the period size using the buffer size), and doesn't properly pace updates (whenever the mixer thread wakes up, it processes however many samples WASAPI says are available regardless if it's at the period size yet). It also seems to get the minimum period size before initialization and the buffer size after initialization, but does nothing with them. It's impossible to tell what the device is going to do with regards to buffering/latency.
I'm the one who made the modifications a long time ago. It was just a quick hack as a proof of concept, and wasn't really meant to be shared, no pun intended.
While not a controlled experiment, I used Wireshark with USBcap to measure the delta between a mouse click and a response in the audio stream. The advantage of this approach is that the DAC's own latency is factored out, but it assumes Wireshark is precise enough to be useful. Using the lowest period size I could in shared mode, it tended towards 30 milliseconds and up, while in exclusive mode, it was typically around 20 milliseconds. As mentioned before, this is very app-dependent, and some OpenAL game could easily create a delay close to three digits of milliseconds, so exclusive mode is hardly a panacea.