sof icon indicating copy to clipboard operation
sof copied to clipboard

[BUG] <wrn> runtime_suspend: Unbalanced suspend; <err> host_trigger(): dma stop failed: -120 in pause-resume test

Open marc-hb opened this issue 2 years ago • 8 comments

Describe the bug

multiple-pause-resume.sh test failure FW reported error: 164 - Unknown

2024-01-11 20:09:21 UTC [ERROR] Caught kernel log error
===========================>>
[ 2099.046497] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 164 - Unknown
[ 2099.046518] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x13000002|0x1
[ 2099.046526] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: failed to set final state 2 for all pipelines
[ 2099.047459] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 12 - Required resource is in invalid state
[ 2099.047475] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x12000000|0x0
[ 2099.047482] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: failed to free pipeline widget pipeline.1
[ 2099.048267] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to free connected widgets
[ 2099.048276] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_free on 0000:00:1f.3: -22
[ 2099.684284] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 9 - Specified resource not found
[ 2099.684314] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x11000007|0x0
[ 2099.684325] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: failed to create module pipeline.1
[ 2099.684335] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to set up connected widgets
[ 2099.684344] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed widget list set up for pcm 0 dir 0
[ 2099.684350] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_params on 0000:00:1f.3: -22
[ 2099.696268] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 9 - Specified resource not found
[ 2099.696287] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x11000002|0x100000
[ 2099.696294] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: failed to create module pipeline.10
[ 2099.696813] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to set up connected widgets
[ 2099.696824] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed widget list set up for pcm 1 dir 1
[ 2099.696831] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_params on 0000:00:1f.3: -22

To Reproduce

TPLG=/lib/firmware/intel/development/sof-mtl-nocodec.tplg MODEL=MTLP_RVP_NOCODEC SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pause-resume.sh -r 50

Reproduction Rate

Apparently very low

Expected behavior

Test does not fail

Impact

Unknown.

Environment

From https://sof-ci.01.org/sofpr/PR8722/build1806/devicetest/index.html

TEST #36783 CI build start time: 2024-01-11T11:32:53-08:00 Linux Branch: topic/sof-dev Commit: 623c00b5e977 Kconfig Branch: master Kconfig Commit: 40725018b505 PR ID: 8722 (8722 is likely unrelated) Commit: Merge c803c465543a into 8d2fb32fa501a213ab7fc482 Zephyr Commit: d7af6f371034

Screenshots or console output

[ 2099.035213] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 2
[ 2099.035565] <inf> dai_intel_ssp: dai_ssp_pause: dai_ssp_pause TX
[ 2099.035575] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa0116a40 0x20210U
[ 2099.035580] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 1
[ 2099.035591] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[ 2099.045740] <inf> ipc: ipc_cmd: rx	: 0x13000002|0x1
[ 2099.045765] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 0
[ 2099.045823] <inf> dai_intel_ssp: dai_ssp_update_bits: dai_ssp_update_bits base 28000, reg 4, mask 200000, value 0
[ 2099.045831] <inf> dai_intel_ssp: dai_ssp_update_bits: dai_ssp_update_bits base 28000, reg 30, mask 100, value 0
[ 2099.045838] <inf> dai_intel_ssp: dai_ssp_stop: dai_ssp_stopTX stop
[ 2099.045856] <inf> dai_intel_ssp: dai_ssp_update_bits: dai_ssp_update_bits base 28000, reg 0, mask 80, value 0
[ 2099.045865] <inf> dai_intel_ssp: dai_ssp_stop: dai_ssp_stop SSE clear for SSP0
[ 2099.045878] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_off: intel_adsp_gpdma_power_off: dma dma@7c000 power off
[ 2099.045976] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 0
[ 2099.045988] <wrn> pm_device: runtime_suspend: Unbalanced suspend
[ 2099.045996] <err> host_comp: host_common_trigger: comp:0 0x4 host_trigger(): dma stop failed: -120
[ 2099.046003] <err> pipe: pipeline_trigger_run: pipe:0 0x0 pipeline_trigger_run(): ret = -120, host->comp.id = 4, cmd = 0
[ 2099.046010] <err> ipc: ipc4_pipeline_trigger: pipeline 0: trigger cmd 0 failed with: -120
[ 2099.046015] <err> ipc: ipc_compound_post_start: failed to process msg 19 status 164
[ 2099.046021] <err> ipc: ipc_cmd: ipc4: FW_GEN_MSG failed with err 164
[ 2099.046566] <inf> ipc: ipc_cmd: rx	: 0x46000002|0x3
[ 2099.047040] <inf> ipc: ipc_cmd: rx	: 0x12000000|0x0
[ 2099.047048] <err> ipc: ipc_comp_free: ipc_comp_free(): comp id: 4 state is 3 cannot be freed
[ 2099.047055] <err> ipc: ipc_pipeline_free: ipc_pipeline_free(): module free () failed
[ 2099.047060] <err> ipc: ipc_cmd: ipc4: FW_GEN_MSG failed with err 12
[ 2099.047506] <inf> ipc: ipc_cmd: rx	: 0x12010000|0x0
[ 2099.047666] <inf> dma: dma_put: dma_put(), dma = 0x400f78b0, sref = 0
[ 2099.047680] <inf> dai_intel_ssp: dai_ssp_pm_runtime_dis_ssp_power: dai_ssp_pm_runtime_dis_ssp_power index 0
[ 2099.047686] <inf> dai_intel_ssp: dai_ssp_pm_runtime_dis_ssp_power: dai_ssp_pm_runtime_dis_ssp_power I2SLCTL
[ 2099.683865] <inf> ipc: ipc_cmd: rx	: 0x11000007|0x0
[ 2099.683883] <err> ipc: ipc4_create_pipeline: ipc: comp id is already taken, pipe_desc->instance_id = 0
[ 2099.683890] <err> ipc: ipc_cmd: ipc4: FW_GEN_MSG failed with err 9
[ 2099.694961] <inf> ipc: ipc_cmd: rx	: 0x47000000|0x0
[ 2099.695546] <inf> ipc: ipc_cmd: rx	: 0x11000002|0x100000
[ 2099.695603] <err> ipc: ipc4_create_pipeline: ipc: comp id is already taken, pipe_desc->instance_id = 0
[ 2099.695611] <err> ipc: ipc_cmd: ipc4: FW_GEN_MSG failed with err 9
[ 2099.696271] <inf> ipc: ipc_cmd: rx	: 0x47000000|0xTerminated

marc-hb avatar Jan 11 '24 21:01 marc-hb

Long before the test fails, there is other <err> dai_nhlt_get_clock_div() on every iteration - even when the test passes.

cc: @softwarecki from https://github.com/zephyrproject-rtos/zephyr/pull/60172#discussion_r1449424496

[ 1571.419130] <inf> dai_intel_dmic_nhlt: print_outcontrol: OUTCONTROL = 00390843
[ 1571.419135] <inf> dai_intel_dmic_nhlt: print_outcontrol:   tie=0, sip=0, finit=0, fci=0
[ 1571.419141] <inf> dai_intel_dmic_nhlt: print_outcontrol:   bfth=3, of=2, ipm=2, th=3
[ 1571.419146] <inf> dai_intel_dmic_nhlt: print_outcontrol:   ipms1=0, ipms2=1, ipms3=0, ipms4=0
[ 1571.419151] <inf> dai_intel_dmic_nhlt: print_outcontrol:   ipms_mode=1
[ 1571.419160] <inf> dai_intel_dmic_nhlt: dai_dmic_configure_coeff: fir_length_a = 101, fir_length_b = 247, packed = 0
[ 1571.419176] <inf> dai_intel_dmic_nhlt: dai_dmic_configure_coeff: fir_length_a = 101, fir_length_b = 247, packed = 0
[ 1571.419191] <err> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: pdm = 0, FIR_CONFIG = 0x00010064
[ 1571.419198] <err> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: dai_index = 0, rate_div = 800, p_clkdiv = 16, p_mcic = 25, p_mfir = 2
[ 1571.419203] <inf> dai_intel_dmic_nhlt: dai_nhlt_update_rate: rate = 48000, channels = 4, format = 2
[ 1571.419210] <inf> dai_intel_dmic_nhlt: dai_nhlt_update_rate: io_clk 38400000, rate_div 800
[ 1571.419215] <inf> dai_intel_dmic_nhlt: dai_dmic_set_config_nhlt: dmic_set_config_nhlt(): enable0 3, enable1 3
[ 1571.420833] <inf> ipc: ipc_cmd: rx	: 0x13000003|0x1
[ 1571.420845] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 2
[ 1571.421788] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa01176c0 0x20210U
[ 1571.421795] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 2 total_num_tasks 2

EDIT: small log level issue, I fixed this in Zephyr.

marc-hb avatar Jan 11 '24 21:01 marc-hb

I just found #8686. @RanderWang is this a duplicate? If yes please close.

marc-hb avatar Jan 12 '24 01:01 marc-hb

I just found #8686. @RanderWang is this a duplicate? If yes please close.

yes, the same issue in different test case

RanderWang avatar Jan 12 '24 02:01 RanderWang

I don't know about the Zephyr side but from an SOF perspective there is no rush to merge this as the failure is extremely rare.

I'd rather the current major regression (DSP panic above) to be fixed first to keep git history / bisect and validation simpler.

marc-hb avatar Jan 12 '24 20:01 marc-hb

@marc-hb Same comment as https://github.com/thesofproject/sof/issues/8686#issuecomment-1921195921

kv2019i avatar Feb 01 '24 12:02 kv2019i

I don't know about 8686 but this one was closed as duplicate.

marc-hb avatar Feb 01 '24 17:02 marc-hb

Re-opening to make it easier to find. With the current, messy status it's better to have one duplicate open rather than missing useful information.

marc-hb avatar Feb 01 '24 17:02 marc-hb

I haven't seen this in the recent weeks.

marc-hb avatar Apr 26 '24 21:04 marc-hb