starting from kernel 5.10.x latency test fails systematically on multiple platforms
If I try the latency test application (test/latency.c) using a Linux Kernel 5.9.x with alsa-lib v1.2.4 on the following audio card: PCH [HDA Intel PCH], device 1: ALC269VC Analog [ALC269VC Analog] (for example) I get the following result:
sudo ./latency -C hw:1 -P hw:1 -f S16_LE -r 48000 -c 2 -s 10 -m 128 -M 128
Scheduler set to Round Robin with priority 99...
Playback device is hw:1
Capture device is hw:1
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 480000 frames, minimum latency = 128, maximum latency = 128
Hardware PCM card 1 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Hardware PCM card 1 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz)
Success
Playback:
*** frames = 480128 ***
state : RUNNING
trigger_time: 1907393.672215
tstamp : 0.000000
delay : 80
avail : 48
avail_max : 64
Capture:
*** frames = 480000 ***
state : RUNNING
trigger_time: 1907393.672215
tstamp : 0.000000
delay : 0
avail : 0
avail_max : 16
Maximum read: 16 frames
Maximum read latency: 333.333us, 0.333333ms (3000.0000Hz)
Hardware sync
Playback time = 1907393.672215, Record time = 1907393.672215, diff = 0
The tests succeeded and final state is RUNNING
If I try the same tests on the same audio device and distribution but with a Linux Kernel >= 5.10.x it fails systematically and I get:
sudo ./lantency -C hw:1 -P hw:1 -f S16_LE -r 48000 -c 2 -s 10 -m 128 -M 128
Scheduler set to Round Robin with priority 99...
Playback device is hw:0
Capture device is hw:0
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 480000 frames, minimum latency = 128, maximum latency = 128
Hardware PCM card 0 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Hardware PCM card 0 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz)
Failure
Playback:
*** frames = 90720 ***
state : XRUN
trigger_time: 967.658489
tstamp : 0.000000
delay : 0
avail : 216
avail_max : 216
Capture:
*** frames = 90592 ***
state : XRUN
trigger_time: 967.658489
tstamp : 0.000000
delay : 0
avail : 88
avail_max : 88
Maximum read: 16 frames
Maximum read latency: 333.333us, 0.333333ms (3000.0000Hz)
Hardware sync
I get an XRUN during the test execution. I can reproduce the same error on multiple boards for multiple audio cards and on different Linux distribution just changing the Linux kernel version. The problem was originally reported at: AES67 Linux daemon: Latency test fails with Linux kernel starting from 5.10.0x
I am interested in testing with low latency (-m 128 -M 128). My guess is that something changed in the Linux audio core and the latency application has to be adapted. Any idea ?
It may not be an audio issue at all. Some other task / kernel driver may eat CPU time preventing the low-latency operation. From the result:
avail : 216
avail_max : 216
It means that the playback is missing 88 samples (216-128) at the under-run check.
You can trace the syscalls using strace -r -o trace.txt like strace -r -s 256 -o trace.txt ./latency -C hw:2 -P hw:2 -f S16_LE -r 48000 -c 2 -m 128 -M 128 -X:
0.000027 write(1, "[146516us] POS: p=72 (min=40, rt=817us) c=8 (max=40, rt=-183us)\n", 64) = 64
0.000030 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
0.000029 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = 0
0.000026 ioctl(5, SNDRV_PCM_IOCTL_HWSYNC, 0) = 0
0.000029 ioctl(4, SNDRV_PCM_IOCTL_HWSYNC, 0x1) = 0
0.000028 write(1, "[146660us] POS: p=72 (min=40, rt=840us) c=8 (max=40, rt=-160us)\n", 64) = 64
0.000030 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
0.000097 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = 0
0.000027 ioctl(5, SNDRV_PCM_IOCTL_HWSYNC, 0) = 0
0.000136 ioctl(4, SNDRV_PCM_IOCTL_HWSYNC, 0x1) = 0
0.000270 write(1, "[147220us] POS: p=64 (min=40, rt=613us) c=8 (max=40, rt=-553us)\n", 64) = 64
0.000059 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
0.000042 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = 0
0.000054 ioctl(5, SNDRV_PCM_IOCTL_HWSYNC, 0) = 0
* 0.000871 ioctl(4, SNDRV_PCM_IOCTL_HWSYNC, 0x1) = 0
0.000061 write(1, "[148305us] POS: p=16 (min=16, rt=695us) c=32 (max=40, rt=-972us)\n", 65) = 65
0.000074 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
0.000101 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = -1 EPIPE (Broken pipe)
without -X:
0.000020 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = 0
0.000025 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffda8f7c980) = 0
0.000019 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
0.000022 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
0.000023 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
0.000022 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
0.000022 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
* 0.001841 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = 0
0.000081 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffda8f7c980) = -1 EPIPE (Broken pipe)
Legend: I marked with * the scheduling time gaps
Perhaps, it may be worth to do tests the the vanilla kernel and do a bisect to the commit which affects this behavior. Another way is to do the kernel profiling.
Thanks for the info. I can reproduce this by simply changing the kernel from any version <= 5.9.x to any version >= 5.10.x: nothing else changes, just the kernel. I'm using the precompiled kernel for the Ubuntu distro (with very similar configuration) and the problem is systematically reproducing on both X86_64 and ARM64 platforms. Other people using various Linux distributions reported the same, so my guess is that the problem is related to some kernel change introduced starting from version 5.10.x impacting the low latency behaviour.
I did some debugging using ftrace (trace-cmd) and it really seems that the system scheduler blocks the latency task by a kworker task:
latency-7395 [002] 22719.778930: hwptr: pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778932: hwptr: pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778934: hwptr: pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778937: hwptr: pcmC0D0c/sub0: POS: pos=48, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778938: applptr: pcmC0D0c/sub0: prev=26768, curr=26800, avail=0, period=64, buf=128
latency-7395 [002] 22719.778941: hwptr: pcmC0D0p/sub0: POS: pos=96, old=26816, base=26752, period=64, buf=128
latency-7395 [002] 22719.778942: applptr: pcmC0D0p/sub0: prev=26896, curr=26928, avail=48, period=64, buf=128
latency-7395 [002] 22719.778949: sched_switch: latency:7395 [0] S ==> kworker/2:1:5386 [120]
<idle>-0 [007] 22719.778971: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.779013: hwptr: pcmC0D0c/sub0: IRQ: pos=48, old=26800, base=26752, period=64, buf=128
<idle>-0 [007] 22719.779070: irq_handler_exit: irq=182 ret=handled
<idle>-0 [007] 22719.779073: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.779084: irq_handler_exit: irq=182 ret=unhandled
kworker/2:1-5386 [002] 22719.779162: sched_wakeup: latency:7395 [0] CPU:002
<idle>-0 [007] 22719.780474: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.780524: sched_wakeup: kworker/7:1:5655 [120] CPU:007
<idle>-0 [007] 22719.780567: hwptr: pcmC0D0p/sub0: IRQ: pos=48, old=26848, base=26752, period=64, buf=128
<idle>-0 [007] 22719.780567: xrun: pcmC0D0p/sub0: XRUN: old=26928, base=26880, period=64, buf=128
<idle>-0 [007] 22719.780699: irq_handler_exit: irq=182 ret=handled
<idle>-0 [007] 22719.780701: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.780729: irq_handler_exit: irq=182 ret=unhandled
<idle>-0 [007] 22719.780736: sched_switch: swapper/7:0 [120] R ==> kworker/7:1:5655 [120]
kworker/7:1-5655 [007] 22719.780744: sched_switch: kworker/7:1:5655 [120] W ==> swapper/7:0 [120]
kworker/2:1-5386 [002] 22719.790242: sched_switch: kworker/2:1:5386 [120] R ==> latency:7395 [0]
latency-7395 [002] 22719.790455: hw_mask_param: pcmC0D0p:0 000/025 ACCESS 0000000000000000ffffffffffffffff 00000000000000000000000000000009
latency-7395 [002] 22719.790455: hw_mask_param: pcmC0D0p:0 000/025 FORMAT 0000000000000000ffffffffffffffff 00000000000000000000000000000404
latency-7395 [002] 22719.790456: hw_mask_param: pcmC0D0p:0 000/025 SUBFORMAT 0000000000000000ffffffffffffffff 00000000000000000000000000000001
As you can see, the latency task was rescheduled at 22719.778949 and system gave the CPU back at 22719.790242 . So the time difference is 0.011292 seconds (~11.3ms). @tiwai: FYI - could you check my interpretation, please?
It really appears like a problem in the linux scheduler or the kworker task.
I think that I found the upstream change between 5.9 and 5.10 kernels in the scheduler which causes this behavior:
https://lore.kernel.org/lkml/c596a06773658d976fb839e02843a459ed4c2edf.1479204252.git.bristot@redhat.com/
See the patch description. You can set the old behaviour using this command:
sysctl -w kernel.sched_rt_runtime_us=1000000
Note that this command may be dangerous - see the patch description. The task can eat whole CPU preventing to run other tasks.
For the archiving purposes - trace-cmd commands:
LATENCY=128 ; USLEEP=200
trace-cmd start -b 10000 -e 'sched_wakeup*' -e sched_switch -e gpio_value -e irq_handler_entry -e irq_handler_exit -e snd_pcm \
/home/redhat/git/alsa-lib/test/.libs/latency -C hw:0 -P hw:0 -f S16_LE -r 48000 -c 2 -s 10 -m "${LATENCY}" -M "${LATENCY}" -y "${USLEEP}"
trace-cmd stop
trace-cmd extract
eventually (to check workqueues):
trace-cmd start -b 10000 -e 'sched_wakeup*' -e sched_switch -e workqueue ...
Use kernelshark or trace-cmd report to analyze the trace.dat file.
Great catch ! this seems to work for me. I will test it on some more platforms and report here. Thanks a lot for the prompt support !
Would a different RT setup in latency program change the behavior? It sounds like a kernel regression if it shows such a latency even with the highest RT priority.
Would a different RT setup in latency program change the behavior? It sounds like a kernel regression if it shows such a latency even with the highest RT priority.
The latency.c sets the scheduler to round-robin (SCHED_RR) with the maximal priority 99. The mentioned kernel patch allows to interrupt tasks even with this highest latency with some non-realtime tasks and it seems that it does in a large time window (from the realtime perspective) - tenths of milliseconds. From my analysis, the workqueue tasks are fine (up to 1ms), but the scheduler wake the latency (busy-loop task) too late.
I added new option to latency test code to allow using SCHED_FIFO, but the results are not good, too. It seems that the scheduler penalty for the busy-loop programs is high in the current kernels and the scheduler does not use "free" CPUs rather than interrupt this busy task.
For reference (pipewire discussion): https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/1271
Would a different RT setup in latency program change the behavior? It sounds like a kernel regression if it shows such a latency even with the highest RT priority.
On the platforms I am currently testing if I change the latency application and skip the setscheduler() call at all, the latency app works fine on kernel >= 5.10.x. (I mean even without setting kernel.sched_rt_runtime_us param). Final behaviour with the new kernel looks anyway unexpected to me.
With a kernel <= 5.9.x I have the opposite behaviour and if I skip the setscheduler() call the latency test fails.
Wow, I did not expect that. It seems that SCHED_OTHER is better than FIFO/RR with the current kernels. I added the SCHED_OTHER policy support to latency.c - -O other (75f8e2e73e03f628a4f2ba55ca8aa3e9f50cdbd9). So we can select the policy without the code modification now.
it sounds ok to me. Just to summarise. According to my tests SCHED_OTHER is required to make latency test work with a kernel >= 5.10.x and SCHED_RR is required with a kernel <= 5.9.x
In addition to allow the user to specify the scheduler policy, can the latency application apply a default scheduler according to the above findings ?
I think that it would be bad to change the application defaults according the kernel version. Also, it's clearly the kernel scheduler problem which should be reported to LKML (linux kernel mailing list) and scheduler maintainers - see https://github.com/torvalds/linux/blob/c3eb11fbb826879be773c137f281569efce67aa8/MAINTAINERS#L18348 .