sof [BUG][META] alsa_conformance_test fails due to too high rate

Describe the bug This is a meta bug to track issues related to problems found by https://chromium.googlesource.com/chromiumos/platform/audiotest/+/HEAD/alsa_conformance_test.md with SOF driver. As similar discussions happen in multiple bugs, I filed a metabug in addition to just linking the issues, so that we can provide common commentary here.

Issues related to this meta bug:

https://github.com/thesofproject/sof/issues/8512
https://github.com/thesofproject/sof/issues/8458
https://github.com/thesofproject/sof/issues/6756
https://github.com/thesofproject/sof/issues/4617
https://github.com/thesofproject/linux/issues/4691
https://github.com/thesofproject/linux/pull/2381

What is the test about?

ALSA conformance rate test monitors (by busylooping on host CPU thread) value of snd_pcm_avail() and records any change of hardware pointer and uses a linear regression to estimate the rate (and calculate an error to estimate).

Why many SOF targets fail the test? Is this a bug?

The very purpose of SOF is to provide a framework to manage a programmable DSP, sitting between application and audio codec(s). The SOF devices must adhere to ALSA PCM API semantics, but within the limits set, the DSP can optimize data transfers based on its needs and the needs of the overall SoC. On many SOF platforms, the driver can report accurate hardware pointer value even during DMA bursts. From alsa_conformance_test's point of view, this shows up as high degree of burstiness of the snd_pcm_avail() values, when it polls the values many times within one ALSA period.

The test has been useful tool to analyze SOF driver and firmware behaviour, but especially the rate error calculation has proven to be complex and we have multiple cases where SOF is behaving correctly, but conformance test repots a failure.

Ways for driver to alert user-space of its particular behaviour

ALSA has a capability interface for driver to declare particular characteristics of drivers. Following interface could be of potential benefit for test apps like alsa_conformance_test to better understand the device it is testing:

Driver flag	alsa-lib	Usage in applications	Can help with SOF and alsa_conformance_test
SNDRV_PCM_INFO_BATCH	snd_pcm_hw_params_is_batch()	Widely used by apps to detect drivers that update hw_ptr only once per period.	Not helpful, SOF drivers can updated hw_ptr continuously
SNDRV_PCM_INFO_DOUBLE	sndpcm_hw_params_is_double()	No known usage in apps. Only a few kernel drivers set.	Not helpful, SOF is not double-buffering like the RME drivers in kernel
SNDRV_PCM_INFO_BLOCK_TRANSFER	snd_pcm_hw_params_is_block_transfer()	No known usage in apps, but set by most ALSA driver... but not by any SOF driver.	Semantics very unclear, device transfers data in block but that's it. Given this is set by so many drivers without any actual user-space usage, starting to use that with more specific semantics does not seem like a good idea.

In summary, there is no means to relay the size of DMA bursts from driver to user-space application. INFO_BATCH could be set by the driver, but that will currently disable other functionality (e.g. in Pulseaudio and Pipewire) if this flag is set.

Existing solutions/workarounds -- merge threshold

A merge feature was added to alsa_conformance_test. By setting --merge_threshold_sz and/or --merge_threshold variable, the test can be instructed to merge measurements that are very close to each other. In most cases, setting the merge threshold to be equal to period size, will result in a passing test with low error rate. To comply with ALSA PCM API, the driver must update the hardware pointer position at least every period. In practise, the tool's existing method to do a dry-run and detect the median step increase, works for the majority of cases (see comment https://github.com/thesofproject/sof/issues/8717#issuecomment-1889518307).

This method is not bullet proof. E.g. https://github.com/thesofproject/sof/issues/8512 failed despite merge threshold being set.

alsa_conformance_test also had a change request to make this the default, but the change was reverted https://chromium-review.googlesource.com/c/chromiumos/platform/audiotest/+/4220795 so it also caused failures on platforms where test was passing without the merge threshold being set.

**Solutions under investigation: aligning to period size **

Instead of merge threshold, align analysis to full period size blocks of data. Big part of the problem seems to be that the hardware pointer may be sampled in middle of a DMA burst. So what if the analysis is forced to happen only at period intervals? This will ensure samples are not taking in middle of DMA bursts, but rest of the analysis can be done with existing logic.

This solution has a problem with potential hiding problems on non-SOF platforms, so would have to be a dedicated option.

UPDATE Jan 12th: updating merge threshold from median step size to period size does seem to solve any fundamental issue at least for the standard 1ms LL tick configuration (see comment https://github.com/thesofproject/sof/issues/8717#issuecomment-1889518307 ). To cover cases with larger host buffer ("deepbuffer" in some SOF topologies), this may still be applicable.

Solutions under investigation: exposing size of host DMA buffer

In many SOF setups, the host DMA buffer size is a small multiple of the Low-Latency scheduler tick length (typically 1ms). This is not always the case, "deep buffer" configurations may define a much longer buffer (e.g. 10ms). Notably this duration will be shorter than the ALSA period size. If this information would be available to user-space, measurements could be aligned (either via merge threshold or a new method) to avoid DMA bursts. OTOH, it's not clear how much benefit this has over just using the period size.

Alternative solutions

In the linked bugs, it has been pointed out that a more robust test for sample rate correctness and sample rate stability is to playback/capture a known reference tone and analyze its characteristics (as described in https://github.com/thesofproject/sof/issues/4617#issuecomment-919776192 . This is especially important as the alsa conformance test relies on system clock (MONOTONIC_RAW source) to sample the audio time, but this is not guaranteed to be aligned with the audio susbsystem's clock. Analysis of a reference tone is immune to this error. The alsa_conformance_test allows to capture failures of this type without need for an external test setup (reference tone generator/capture), so it has its own merits and places of use (and it has been used long in ChromeOS and Android), but the methodology limitations should be noted.

New Tools to Debug

alsa_conformance_test: add debug mode for step/avail analysis, https://github.com/kv2019i/cros-audiotest/commit/d4d6f422554e8c8f03c6d5591f7afa922e1471e3 Allow to visually plot the snd_pcm_avail() updates and some new debugging to use debug behaviour with bursty DMA. Added Jan 12 2024.

Jan 09 '24 18:01 kv2019i

"in most cases, setting the merge threshold to be equal to period size, will result in a passing test with low error rate."

did you mean equal to the DSP firmware period size (1ms) ?

Jan 10 '24 07:01 plbossart

@plbossart wrote:

"in most cases, setting the merge threshold to be equal to period size, will result in a passing test with low error rate."

did you mean equal to the DSP firmware period size (1ms) ?

I did actually refer to period size on purpose. The below commentary is potentially a bit redundant (or at least beside the point) as my latest tests this week demonstrate that the existing method in alsa-conformance-test to figure out the step increase median, actually works very well with SOF, and e.g. with the TGL and MTL targets, finds the 1ms burst rate reliably.

But with that disclaimer, my rationale to not use LL tick length is that most SOF targets will have extra buffering towards the host. E.g. Intel SOF targets have a DMA buffer of 4 x LL period, e.g. 4ms in typical configuration (and this really doesn't scale for e.g. 10ms LL tick). The exact reasons why this is 4 (and not 2) is subject to some long discussions as this magic number dates back to origins of the SOF project. Notably this is platform specific and other SOF targets may use a lower/higher number. What we can say though is:

In normal operating conditions:

SOF DSP will consume/produce data every 1ms (or what is the LL tick [footnote1]), there's no 4ms delay added by design

In exceptional cases:

If DDR transactions are delayed (but DSP continues LL ticks), the host side SRAM buffer can mitigate this (but transient longer bursts are seen)
If DSP LL scheduler is overloaded (but host DMA continues), the host-side buffer can mitigate this and avoid data loss.

Both are in practise extremely rare occurences, but the extra slices of buffering protect again audible glitches. But they also add to the maximum burst size.

OTOH, bursts that would exceed the set period size, would cause problems to large amount of ALSA apps, so this woud seem to be a safe limit for the application in terms of max expected burst size.

That brings me to my opening sentence in the comment: in practise the alsa-conformance-test's mechanism to discover the median step size, seems to be working well. I'll make another comment on other sources of errors I've discovered.

[footnote1] with larger than 1ms host buffer, the SOF host DMA driver has a feature (CONFIG_HOST_DMA_RELOAD_DELAY_ENABLE) which can modify the pace of DMA transactions - this is still constrainted to host period size

Jan 12 '24 14:01 kv2019i

Then some updates on avail testing. I put together a modified conformance-test tool that can be used to plot the snd_pcm_avail() visually and print out more data in case anomalies are seen: https://github.com/kv2019i/cros-audiotest/commit/d4d6f422554e8c8f03c6d5591f7afa922e1471e3

The data now contradicts with some of the earlier hypothesis:

the method of figuring out the median step size (and using that to set the merge threshold automatically), seems to be working correctly -> e.g. on the typical Intel SOF hda-generic device, this will be 1ms and 48 sample frames
despite busyloop on snd_pcm_avail() (+50000 system calls per second!), it's a rare occurence to hit a middle of a 1ms DMA burst and the existing logic to handle the burst seems to tackle this (the observation is merged before regression) (ref:log1)
on multiple runs with high reported rate error, the rootcause seems to be a scheduling gap -- there's a big jump in snd_pcm_avail(), but this is caused by the test application not running, not by bad hw ptr movement (ref:log2)

ref:log1 - example of test samplign snd_pcm_avail() in mdidle of DMA burst

--cut--
Time(msec)  Delta(msec)	 AvailCh Read+Avai Read+D Nominal(Sampl         Avail-Diff(Samples)
3305.987127 0.999514     19     151094     151094 158687.382096         48
3307.017154 1.030027     18     151142     151142 158736.823392         48
3307.967392 0.950238     17     151144     151190 158782.434816          2
3308.018751 0.051359      1     151190     151190 158784.900048         46
3308.979559 0.960808     17     151192     151238 158831.018832          2
3309.036684 0.057125      1     151238     151238 158833.760832         46
3309.980424 0.943740     17     151286     151286 158879.060352         48
3310.979021 0.998597     18     151334     151334 158926.993008         48
3312.000993 1.021972     20     151382     151382 158976.047664         48
--cut--

Every line marks a condition where user-space has detected snd_pcm_avail() to change value. In above, when a partial burst is reported (like 46 samples), it is immediately followed up by new measurement of 2 samples. This will be handled corretly and not reported as a rate error.

ref:log2 - example of test failing to observe snd_pcm_avail()

--cut--
4773.287684 0.977670     88     218006     218006 229117.808832         48
4774.290103 1.002419     91     218054     218054 229165.924944         48
note: a 12.1msec gap, the alsa-conformance-test was not simply scheduled during this time!
4786.434342 12.144239      1     218630     218630 229748.848416        576
4787.282624 0.848282     72     218660     218678 229789.565952         30
4787.307675 0.025051      1     218678     218678 229790.768400         18
4788.284404 0.976729     90     218688     218726 229837.651392         10
4788.300819 0.016415      1     218726     218726 229838.439312         38
4789.284356 0.983537     92     218774     218774 229885.649088         48
4791.280941 1.996585    157     218776     218820 229981.485168          2
4791.306447 0.025506      1     218822     218822 229982.709456         46
4792.290079 0.983632     87     218870     218870 230029.923792         48
--cut--

The one case where snd_pcm_avail() moved by 576 sample frames, the avail check number of 1 tells that the busyloop did not run any extra times during the 12msec that has elapsed. This type of errors do show up as higher calculated error rate even though regression filters out the impact a bit.

I can't say how many of the open bugs are explained by similar phenomena, but definitely this needs to be checked for each such case now. If the changes to audiotest tool prove useful, I can upstream a version to CrOS. The current version is overlaping with the debug functionality, so cannot be upstreamed in its current form.

The reliability of alsa-conformance-test scheduling could be fixed by using POSIX_REALTIME scheduling class, but mixing POSIX realtime and user-space code that busyloops, is very, very dangerous, so probably a better approach would be to modify the test app to be driven by some high frequency interrupt source and poll snd_pcm_avail() from there.

Jan 12 '24 15:01 kv2019i

On many SOF platforms, the driver can report accurate hardware pointer value even during DMA bursts. From alsa_conformance_test's point of view, this shows up as high degree of burstiness of the snd_pcm_avail() values, when it polls the values many times within one ALSA period.

I don't understand all the details but something sounds fundamentally wrong.

Either the test is a low-level, fine-grained, white box test that has knowledge of all relevant information about DMAs, scheduling and burstiness,
or the test is high-level, abstract black box test that does not know implementation details. Then it should look ONLY at the final product (= the audio)

A test cannot be both high level and low level at the same time. Yet a lot of the discussions seem to constantly swing back and forth between high level and low level. This does not make sense; which is it?

In summary, there is no means to relay the size of DMA bursts from driver to user-space application.

So this can't be a "white box" test for now, not until it is possible to reliably expose the needed implementation details. Until the required information is available, tests should stay high-level and they should stop trying and failing to make burstiness guesses. For now testing should stick instead to actual, confirmed Xruns and not what happens before that point. Real-time does not care about averages, it's only about making deadlines.

What am I missing?

works for the majority of cases

A test that makes guesses and works "most of the time" does a lot more harm than good.

Don't get me wrong: collecting and graphing information is still extremely useful, even with incomplete knowledge. But at the end of the day the PASS/FAIL decision cannot rely on guesses.

Jan 16 '24 16:01 marc-hb

@marc-hb wrote:

A test cannot be both high level and low level at the same time. Yet a lot of the discussions seem to constantly swing back and > forth between high level and low level. This does not make sense; which is it? [...] So this can't be a "white box" test for now, not until it is possible to reliably expose the needed implementation details. Until the > required information is available, tests should stay high-level and they should stop trying and failing to make burstiness guesses.

I think you hit the nail on the head, this is the core issue and something I want to address in this metabug. So questions discussed in linked bugs:

does ALSA PCM interface limit how bursty hw_ptr can be for driver to still comply with ALSA PCM interface?
is the merge-threshold mechanism using information that is inherently implementation specific?

So I looked at (1) again and my take is there are no strict established limits other than the period-size as configured by the application. I surveyed the hw info flags and the facilities to describe hw_ptr behaviour and they are very limited. Also the almost non-existant use of these flags in applications (and alsa-lib) shows this is not something apps depend on (with the named exception of BATCH which I describe in above, used by all sound servers).

As for (2), in multiple older bugs, we'd had the hypothesis merge-threshold is not handlng the DMA bursts correctly and this would then be invalid assumption for a black-box ALSA PCM interface test. When I've now done fresh debug on current gen platforms, I can't confirm this hypothesis. In fact, in the cases I've looked at, merge-threshold is working correctly and the rate error comes from a different source.

So status as of now is that this is a valid blackbox test of the ALSA PCM interface. We'll need to take another look at bugs like https://github.com/thesofproject/sof/issues/8458 to see where the rate error comes from (or have new proof it comes from DMA bursts).

Jan 19 '24 12:01 kv2019i

Back when we had Linux Audio Miniconferences, there was a recurring theme that would come up, viz. SNDRV_PCM_INFO_BLOCK_TRANSFER is not a fine-grained enough indicator for user-space, and that perhaps having a way for the driver to expose the transfer size would be useful -- then userspace could use that for some sort of safety margin.

Jan 09 '25 22:01 ford-prefect

@ford-prefect ack - we were all discussing this again last week. @kv2019i @ujfalusi @ranj063 @bardliao @sathya-nujella @udaymb

Our DMA is non linear vs time or asynchronous when copying audio, this is to conserve power and only copy when memory bus is active already. It has logic to make sure data is always copied before deadline though. The delta to real position is also variable between a min and max i.e. we cant pass up a constant to userspace. I think @kv2019i is working on some proposals.

Jan 13 '25 16:01 lgirdwood

Just to add a note to @lgirdwood @ford-prefect , in context of alsa_conformance_test (i.e. this bug), I'm standing by my previous comment (see my comment above Jan 2024). IOW, "So status as of now is that this is a valid blackbox test of the ALSA PCM interface. " The hw_ptr burstiness makes debugging conformance test bugs harder, but as far as I can tell surveying our existing base of bugs, it's not the rootcause of failures.

The new bug with Pipewire is something different: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4489 ... for this, we can reconsider exposing the maximum burst size info. I can file a separate ticket for this.

Jan 29 '25 12:01 kv2019i

As per my comnent on Jan 29, I don't think we have actions to take for 2.13 release, so propose to remote the 2.13 milestone from this.

Apr 23 '25 13:04 kv2019i

And follow-up to previous comment, clearing out v2.13 from this.

May 02 '25 08:05 kv2019i