linux icon indicating copy to clipboard operation
linux copied to clipboard

6.10 kernel: i915 driver crash in system suspend due to wakeref from SOF code

Open rfvirgil opened this issue 1 year ago • 9 comments

6.10 kernel UpXtreme TGL + CDB35L56-EIGHT-C (SoundWire) Play audio, then system-suspend while audio is still playing.

We're seeing that with the 6.10 kernel there is a WARN triggered in the i915 driver, intel_runtime_pm_driver_release() because there is a wakeref that has not been released. This happens if audio was playing when the system entered system-suspend. For example:

play audio.wav &
sudo rtcwake -m mem -s 5

I enabled the debugging in the i915 driver so that wakeref requests are tracked, and printed out the references when the WARN is triggered, and I get the reftracking info below (full dmesg attached dmesg.txt)

So, snd_sof has taken a wakeref on the i915 driver during probe, but has not released it when the system went to system-suspend. The i915 driver expects that all wakerefs have been released before system-suspending.

[   72.906688] PM: suspend entry (deep)
[   72.910341] Filesystems sync: 0.003 seconds
[   72.913001] Freezing user space processes
[   72.915545] Freezing user space processes completed (elapsed 0.002 seconds)
[   72.915613] OOM killer disabled.
[   72.915617] Freezing remaining freezable tasks
[   72.917124] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[   72.917333] printk: Suspending console(s) (use no_console_suspend to debug)
[   73.255773] cirrus-reset-owner 1013BAAD:00: Suspend
[   73.546490] ref_tracker: 0000:00:02.0@0000000000296b48 has 1/1 users at
                    __intel_runtime_pm_get+0x72/0xf0 [i915]
                    intel_display_power_get+0x27/0x70 [i915]
                    i915_audio_component_get_power+0x20/0x190 [i915]
                    snd_hdac_display_power+0x85/0x130 [snd_hda_core]
                    hda_codec_i915_init+0x3a/0x50 [snd_sof_intel_hda]
                    hda_dsp_probe_early+0x167/0x240 [snd_sof_intel_hda_generic]
                    snd_sof_device_probe+0x24b/0x370 [snd_sof]
                    sof_pci_probe+0x12f/0x200 [snd_sof_pci]
                    local_pci_probe+0x42/0xa0
                    pci_device_probe+0xc7/0x240
                    really_probe+0xd3/0x380
                    __driver_probe_device+0x78/0x150
                    driver_probe_device+0x1f/0x90
                    __driver_attach+0xd6/0x1d0
                    bus_for_each_dev+0x8c/0xe0
                    bus_add_driver+0x115/0x240

[   73.546596] ------------[ cut here ]------------
[   73.546599] i915 0000:00:02.0: i915 raw-wakerefs=1 wakelocks=1 on cleanup
[   73.546648] WARNING: CPU: 0 PID: 573 at drivers/gpu/drm/i915/intel_runtime_pm.c:447 intel_runtime_pm_driver_release+0x66/0x80 [i915]

rfvirgil avatar Jul 23 '24 10:07 rfvirgil

@plbossart FYI

rfvirgil avatar Jul 23 '24 10:07 rfvirgil

@rfvirgil can you clarify which endpoint was used? I would guess it's one of the HDMI endpoints but the description refers to the CS board.

I am afraid I don't fully get the i915 flows, clearly we don't do anything suspending, hda_codec_i915_display_power() is only called in the resume sequence.

this will most likely have to wait for @kv2019i to look at this...

plbossart avatar Jul 23 '24 12:07 plbossart

@ujfalusi FYI since you are in the middle of HDMI debugging.

plbossart avatar Jul 23 '24 12:07 plbossart

@plbossart What do you mean by "endpoint" ? We're playing audio to the CS35L56 on the EIGHT-C board. HDMI is connected to a monitor for console display, but we aren't using it for audio. Also, we aren't using graphics. The UpX doesn't have X or Wayland installed.

As a workaround I've set snd_hda_core.gpu_bind=0

I did start looking at the way the wakeref was implemented in i915 and snd-hda, but it was too complex for me to understand...

I've found hdac_hdmi_runtime_suspend(), which seems to be releasing the wakeref when it runtime-suspends. I'm guessing that the hdac code isn't handling the case where you system-suspend while still playing audio, so you were not runtime-suspended.

rfvirgil avatar Jul 23 '24 12:07 rfvirgil

I cannot reproduce it on upx (running in HDA mode, nothing fancy connected), neither on SDW device which have HDMI (and panel, which might change things). I'm not sure what can cause this atm

ujfalusi avatar Jul 23 '24 13:07 ujfalusi

I've found hdac_hdmi_runtime_suspend(), which seems to be releasing the wakeref when it runtime-suspends. I'm guessing that the hdac code isn't handling the case where you system-suspend while still playing audio, so you were not runtime-suspended.

the system suspend in hdac_hdmi.c seems to rely on pm_runtime_force_suspend(). I am not sure what it does and why this would matter anyways if you are not playing audio on HDMI. the entire i915 audio path should be disabled and powered-down.

Unless there's the silly keep-alive function that keeps HDMI running?

plbossart avatar Jul 23 '24 16:07 plbossart

Unless there's the silly keep-alive function that keeps HDMI running?

Hrm, I have the silent ~~scream~~ stream disabled on all of my devices...

ujfalusi avatar Jul 24 '24 05:07 ujfalusi

I cannot say what is going on...

On UPX-i11 HDA: it is OK under all combination

With SDW device (MTL): gives i915 raw-wakerefs=1 wakelocks=1 on cleanup under any combination, regardless of silent steam or playback

With SDW device (TGL-H): it is OK under all combination

:shrug:

ujfalusi avatar Jul 24 '24 09:07 ujfalusi

Unless there's the silly keep-alive function that keeps HDMI running?

In a totally different i915 issue I recently discovered that DRM is polling for monitor hotplug by default even when everything is off. It's configured by /sys/module/drm_kms_helper/parameters/poll/intel_dp_detect

You can see it with options drm debug=0x2, example:

Jul 17 16:20:59 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Resuming device
Jul 17 16:20:59 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Device resumed
Jul 17 16:21:11 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_suspend [i915]] Suspending device
Jul 17 16:21:11 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_suspend [i915]] Device suspended
Jul 17 16:21:21 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Resuming device
Jul 17 16:21:21 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Device resumed

Sorry if off-topic, I thought it was worth sharing.

marc-hb avatar Jul 29 '24 18:07 marc-hb

@rfvirgil, I have tried with newer kernel on MTL-SDW and I cannot reproduce the issue which exhibited it last July. Is it possible that it is now fixed? Can you check and close the issue in that case?

ujfalusi avatar Feb 28 '25 13:02 ujfalusi

We haven't seen this with recent kernels so you can close this ticket.

rfvirgil avatar Feb 28 '25 13:02 rfvirgil