linux_ms_dev_kit icon indicating copy to clipboard operation
linux_ms_dev_kit copied to clipboard

KVM and EL2 on XPS13

Open Paul-Mysten opened this issue 4 months ago • 17 comments

Figured Id create this to discuss the current state and possibilities for moving it forward. With SLbounce and current kernels, EL2 booting seems to work just fine, even KVM seems to boot up happily and recognize EL2 on all cores.

However, upon starting a VM with hardware acceleration, the entire machine halts immediately. I haven't even been able to get so much as a crash dump. So I'm looking for steps to try and debug this and maybe get hardware virtualization working.

I'll have to find my notes on exactly what I've tried but so far the problem is it halts and reboots too quickly for anything to give me valuable info. I've stopped just shy of kernel debugging with a remote machine due to time but plan on picking that up soon.

So any tips? Suggestions? Methodology? Know of any existing discussions or people working on this specific topic? Maybe an IRC or similar?

Thanks

Paul-Mysten avatar Aug 31 '25 22:08 Paul-Mysten

Hardware acceleration would be GPU? I have kvm running on a Windows Dev Kit 2023 and on a T14s . Never cared about hardware acceleration, although on X1 you need to add id_aa64mmfr0.ecv=1 to the kernel command line to get running VMs.

jglathe avatar Sep 01 '25 06:09 jglathe

No, hardware/native virtualization, as in not paravirtualization but actual hardware level using EL2

Paul-Mysten avatar Sep 01 '25 21:09 Paul-Mysten

… on EL2. Can run real VMs on both.

jglathe avatar Sep 01 '25 22:09 jglathe

... On a dell xp13 9345, it crashes... When running SLbounce, with full EL2 per dmesg, using qemu-system-aarch64 -M virt -cpu host -smp 1 -m 1G -nographic -enable-kvm leads to an instant kernel panic and reboot for the host machine. In fact it goes down so fast, there is no dumps, no logs, nothing. Even with tracing enabled and reserved memory, there is virtually no info.

Paul-Mysten avatar Sep 02 '25 03:09 Paul-Mysten

Did you add the parameter in the kernel command line?

jglathe avatar Sep 02 '25 05:09 jglathe

Yes I have that in the kernel args when booting.

Paul-Mysten avatar Sep 04 '25 20:09 Paul-Mysten

Hmm I did a little testing on the T14s where I have this parameter. Same command does not result in a functional VM you could use, but it doesn't kill the host either. I can bring up a Win11 VM, no issue, and I will install lxd and bring up a vm when I have some time to tinker.

jglathe avatar Sep 04 '25 20:09 jglathe

yeah I wish I had more time the last week or so to actually gather real info. But it seems its only the dell xps13 9345 at the moment with this issue as far as I can tell and it shouldn't be that much different from known working devices.... I'll try to post the configs for a sanity check this weekend and some logs if I can manage to get any. My boot configs been edited so many times by hand it absolutely could be a problem I created.

Paul-Mysten avatar Sep 04 '25 20:09 Paul-Mysten

Interesting, thanks for looking. One of my goals of this thread was to see what kind of methods are good for capturing these quick crashes? I'm more familiar with embedded devices and capturing debug output over serial lines; which I guess I could do the same thing here. But first I wanted to see if a better starting place exists since the previous attempts at journalctl and similar dont seem to provide anything of value, so I dont imagine the serial output will either unless I pepper it with kprints

Paul-Mysten avatar Sep 05 '25 20:09 Paul-Mysten

May I ask what kernel / distro you're using? I'm using either Ubuntu Concept 25.04 kernels, or my own, which have a slightly extended kernel config, but KVM support is on on both of them. Currently setting up lxd on the T14s.

jglathe avatar Sep 05 '25 20:09 jglathe

Sure, Ubuntu 25.10 concept with a couple different 6.16 kernels from your branches and the default one. Also tried some of the upstream branches. No luck on any so far in KVM, but otherwise everything else is fairly stable, only a few crashes when doing things like resuming from sleep on power loss but I dont even remember what kernel did that, its not all of them.

I made sure to use the EL2 kernels as well, I verified the CPU cores are all on EL2 via dmesg, kvm init looks good, etc.

Paul-Mysten avatar Sep 06 '25 00:09 Paul-Mysten

I did some testing the other day. Installed lxd on my T14s, added a 25.04 VM, enforced -enable-kvm, checked that it actually runs with it, and no crash. I had one "crash" over night, but that one looks quite like a reboot, only the logind lines are missing - reason unknown. Box didn't come up again because EL2 kernel and no slbounce loaded beforehand. So, I'm a bit at a loss here. Do you use the -el2 version of the dtb in the kernel packages? The sudden crash points to something either tz does or the smmu does - it is ones thing that gets modified for el2, enabling the arm-smmuv3, which is usually under HYP control (Gunyah on X1).

jglathe avatar Sep 08 '25 05:09 jglathe

Same OP, so long story short I got to a point where I could prevent crashes with the following:

menuentry 'EL2' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-6.16.0-rc6-jg-2-qcom-x1e-advanced-ce06a3ba-4b1f-4d50-ab8e-cb82b2ed159f' {
        recordfail
        load_video
        gfxmode $linux_gfx_mode
        insmod gzio
        if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
        insmod part_gpt
        insmod ext2
        search --no-floppy --fs-uuid --set=root cbab1051-3824-4b26-a521-a2aa8e41a31d
        echo    'Loading Linux 6.16.0-rc6-jg-2-qcom-x1e ...'
        linux   /vmlinuz-6.16.0-rc6-jg-2-qcom-x1e root=/dev/mapper/ubuntu--vg-ubuntu--lv ro  clk_ignore_unused pd_ignore_unused cma=128M nosplash console=tty0 modprobe.blacklist=qcom_geni_serial crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M, id_aa64mmfr0.ecv=1
        echo    'Loading initial ramdisk ...'
        initrd  /initrd.img-6.16.0-rc6-jg-2-qcom-x1e
        echo    'Loading device tree blob...'
        devicetree      /x1e-el2.dtb
}

Now when I do the prior qemu-system-aarch64 -M virt -cpu host -smp 1 -m 1G -nographic -enable-kvm instead of crashing I get a near total lockup with very brief moments of control that I could kill the vm from. From this i was able to get some logs finally:

dmesg

[  287.078052] geni_i2c 884000.i2c: Timeout abort_m_cmd
[  289.061158] power_supply qcom-battmgr-bat: driver failed to report `serial_number' property: -110
[  290.086155] power_supply qcom-battmgr-bat: driver failed to report `voltage_now' property: -110
[  291.110158] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  292.134114] power_supply qcom-battmgr-bat: driver failed to report `cycle_count' property: -110
[  293.158181] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  295.078201] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  296.101126] power_supply qcom-battmgr-bat: driver failed to report `energy_full_design' property: -110
[  298.025147] power_supply qcom-battmgr-bat: driver failed to report `voltage_max_design' property: -110
[  299.110144] [drm:dpu_encoder_frame_done_timeout:2715] [dpu error]enc38 frame done timeout
[  299.230175] [drm:dpu_encoder_frame_done_timeout:2715] [dpu error]enc38 frame done timeout
[  300.197224] power_supply qcom-battmgr-bat: driver failed to report `energy_now' property: -110
[  300.712115] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  300.712291] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  300.712382] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  301.224128] geni_i2c 884000.i2c: Timeout abort_m_cmd
[  301.734175] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  301.734339] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  301.734429] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  302.328168] power_supply qcom-battmgr-bat: driver failed to report `voltage_max_design' property: -110
[  302.696069] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  302.696233] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  302.696323] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  303.334138] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  303.721131] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  303.721294] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  303.721386] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  304.357070] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  304.743029] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  304.743192] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  304.743283] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  305.385111] power_supply qcom-battmgr-bat: driver failed to report `temp' property: -110
[  305.701072] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  305.701235] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  305.701327] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  306.408014] power_supply qcom-battmgr-bat: driver failed to report `cycle_count' property: -110
[  306.729016] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  306.729181] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  306.729273] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  307.430004] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  307.751063] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  307.751227] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  307.751318] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  308.453984] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  308.711983] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  308.712146] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  308.712237] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  309.476957] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  309.732944] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  309.733108] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  309.733199] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  310.505024] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  310.695993] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  310.696157] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  310.696247] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  311.527999] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  311.718907] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  311.719071] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  311.719161] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  312.741956] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  312.742119] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  312.742211] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  313.703876] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  313.704040] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  313.704131] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  314.725951] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  314.726118] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 6880
[  314.726209] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 6881
[  314.860321] msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 67.5.12.1: hangcheck recover!
[  314.860503] msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 67.5.12.1: hangcheck recover!
[  314.966866] [drm:dpu_encoder_frame_done_timeout:2715] [dpu error]enc38 frame done timeout
[  315.876868] power_supply qcom-battmgr-bat: driver failed to report `cycle_count' property: -110
[  316.900841] geni_i2c 884000.i2c: Timeout abort_m_cmd
[  316.900840] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  317.926931] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  318.950894] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  319.975894] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  320.998796] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  325.095846] ------------[ cut here ]------------
[  325.095856] WARNING: CPU: 9 PID: 414 at drivers/soc/qcom/rpmh.c:386 rpmh_write_batch+0x180/0x320
[  325.095879] Modules linked in: aes_ce_ccm michael_mic snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat bridge stp llc binfmt_misc nls_iso8859_1 qrtr_mhi ath12k mac80211 cfg80211 libarc4 mhi snd_soc_lpass_va_macro snd_soc_lpass_wsa_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_hdmi_codec snd_soc_lpass_macro_common snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pm8941_pwrkey qcom_spmi_temp_alarm snd_seq_device industrialio snd_timer snd pci_pwrctrl_pwrseq pci_pwrctrl_core soundcore qcom_edac leds_gpio input_leds joydev sch_fq_codel nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_masq nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt nvme nvme_core nvme_keyring nvme_auth hid_multitouch qcom_pon qrtr_smd reboot_mode nvmem_qcom_spmi_sdam rtc_pm8xxx hid_generic i2c_hid_of i2c_hid
[  325.096104]  qcom_pd_mapper rpmsg_ctrl hid ps883x phy_nxp_ptn3222 msm polyval_ce ghash_ce sm4_ce_gcm qcom_q6v5_pas sm4_ce_ccm qcom_pil_info qcom_spmi_pmic drm_exec qcom_stats sm4_ce phy_qcom_edp qcom_common dispcc_x1e80100 ocmem sm4_ce_cipher qcom_glink_smem gpu_sched sm4 pinctrl_sm8550_lpass_lpi ucsi_glink qcom_q6v5 qcom_sysmon sm3_ce videocc_sm8550 i2c_qcom_geni phy_snps_eusb2 pinctrl_lpass_lpi lpasscc_sc8280xp gpucc_x1e80100 mdt_loader tcsrcc_x1e80100 icc_bwmon qcom_cpucp_mbox typec_ucsi socinfo qcom_battmgr pwrseq_qcom_wcn sha3_ce sha1_ce pwrseq_core gpio_keys uio_pdrv_genirq fixed uio qrtr aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[  325.096150] CPU: 9 UID: 0 PID: 414 Comm: gpu-worker Kdump: loaded Not tainted 6.16.0-rc6-jg-2-qcom-x1e #2 PREEMPT(voluntary) 
[  325.096152] Hardware name: Dell Inc. XPS 13 9345/0W8JXV, BIOS 2.8.0 04/30/2025
[  325.096154] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  325.096156] pc : rpmh_write_batch+0x180/0x320
[  325.096158] lr : rpmh_write_batch+0x178/0x320
[  325.096160] sp : ffff800083afb4a0
[  325.096161] x29: ffff800083afb4a0 x28: ffffcd743e6976e0 x27: 0000000000000004
[  325.096164] x26: 0000000000000000 x25: ffffcd743d1778e0 x24: 0000000000000000
[  325.096167] x23: ffff0008135d8108 x22: ffff0008135d8108 x21: ffff0008135d8128
[  325.096170] x20: 0000000000000000 x19: 0000000000000000 x18: ffff800083aad0c0
[  325.096173] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaabb4ebde0
[  325.096175] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  325.096178] x11: 0000000000000000 x10: 2bc0536d90a4b249 x9 : ffffcd743c9278f8
[  325.096180] x8 : ffff000819886a48 x7 : 0000000000000000 x6 : 0000000000000000
[  325.096183] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[  325.096185] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[  325.096188] Call trace:
[  325.096190]  rpmh_write_batch+0x180/0x320 (P)
[  325.096192]  qcom_icc_bcm_voter_commit+0x284/0x520
[  325.096197]  qcom_icc_set+0x2c/0x58
[  325.096199]  apply_constraints+0x88/0x100
[  325.096200]  icc_set_bw+0xc8/0x318
[  325.096202]  _set_opp_bw+0x5c/0x108
[  325.096205]  _set_opp+0x280/0x648
[  325.096206]  dev_pm_opp_set_opp+0x7c/0x120
[  325.096208]  a6xx_gmu_set_freq+0x26c/0x518 [msm]
[  325.096241]  a6xx_gpu_set_freq+0x48/0x80 [msm]
[  325.096259]  msm_devfreq_target+0x9c/0x198 [msm]
[  325.096278]  devfreq_set_target+0x9c/0x248
[  325.096281]  devfreq_update_target+0xe4/0x110
[  325.096282]  qos_min_notifier_call+0x3c/0xa0
[  325.096284]  notifier_call_chain+0x84/0x168
[  325.096288]  blocking_notifier_call_chain+0x50/0xd0
[  325.096290]  pm_qos_update_target+0xe0/0x1a8
[  325.096293]  freq_qos_apply+0xac/0xe0
[  325.096296]  apply_constraint+0xb4/0x200
[  325.096299]  __dev_pm_qos_update_request+0xa4/0x220
[  325.096302]  dev_pm_qos_update_request+0x44/0x80
[  325.096304]  msm_devfreq_boost_work+0x24/0x50 [msm]
[  325.096321]  kthread_worker_fn+0xf4/0x2d0
[  325.096323]  kthread+0x114/0x130
[  325.096326]  ret_from_fork+0x10/0x20
[  325.096328] ---[ end trace 0000000000000000 ]---
[  325.096330] Error sending AMC RPMH requests (-110)
[  331.240731]  6800000.remoteproc:glink-edge: intent request timed out
[  331.240748] power_supply qcom-battmgr-bat: driver failed to report `status' property: -110
[  332.012637] ------------[ cut here ]------------
[  332.012639] UBSAN: invalid-load in /home/user/code/el2/linux_ms_dev_kit-jg-ubuntu-qcom-x1e-6.16.0-rc6-jg-2/drivers/soc/qcom/rpmh.c:84:7
[  332.012643] load of value 24 is not a valid value for type '_Bool'
[  332.012647] CPU: 0 UID: 1000 PID: 4800 Comm: qemu-system-arm Kdump: loaded Tainted: G        W           6.16.0-rc6-jg-2-qcom-x1e #2 PREEMPT(voluntary) 
[  332.012650] Tainted: [W]=WARN
[  332.012650] Hardware name: Dell Inc. XPS 13 9345/0W8JXV, BIOS 2.8.0 04/30/2025
[  332.012652] Call trace:
[  332.012653]  show_stack+0x38/0xa0 (C)
[  332.012660]  dump_stack_lvl+0x84/0xc0
[  332.012663]  dump_stack+0x1c/0x40
[  332.012665]  ubsan_epilogue+0x14/0x60
[  332.012667]  __ubsan_handle_load_invalid_value+0xc4/0xf0
[  332.012671]  rpmh_tx_done+0xb4/0xc0
[  332.012674]  tcs_tx_done+0x204/0x460
[  332.012676]  __handle_irq_event_percpu+0x68/0x2a0
[  332.012679]  handle_irq_event+0x58/0xe0
[  332.012680]  handle_fasteoi_irq+0xac/0x1e0
[  332.012682]  handle_irq_desc+0x40/0xa0
[  332.012684]  generic_handle_domain_irq+0x28/0x50
[  332.012685]  __gic_handle_irq_from_irqson.isra.0+0x194/0x378
[  332.012687]  gic_handle_irq+0x2c/0xa0
[  332.012689]  do_interrupt_handler+0x5c/0xb8
[  332.012690]  el1_interrupt+0x48/0xf8
[  332.012693]  el1h_64_irq_handler+0x1c/0x40
[  332.012695]  el1h_64_irq+0x84/0x88
[  332.012696]  handle_softirqs+0xb0/0x420 (P)
[  332.012700]  __do_softirq+0x20/0x3c
[  332.012701]  ____do_softirq+0x1c/0x40
[  332.012702]  call_on_irq_stack+0x3c/0x50
[  332.012703]  do_softirq_own_stack+0x28/0x60
[  332.012705]  __irq_exit_rcu+0x184/0x1c8
[  332.012706]  irq_exit_rcu+0x1c/0x40
[  332.012707]  el1_interrupt+0x4c/0xf8
[  332.012708]  el1h_64_irq_handler+0x1c/0x40
[  332.012710]  el1h_64_irq+0x84/0x88
[  332.012711]  kvm_arch_vcpu_ioctl_run+0x5bc/0x750 (P)
[  332.012714]  kvm_vcpu_ioctl+0x1a0/0xbe0
[  332.012717]  __arm64_sys_ioctl+0xd0/0x160
[  332.012720]  invoke_syscall+0x70/0x120
[  332.012723]  el0_svc_common.constprop.0+0x4c/0x140
[  332.012725]  do_el0_svc+0x28/0x60
[  332.012727]  el0_svc+0x40/0x190
[  332.012728]  el0t_64_sync_handler+0x134/0x160
[  332.012730]  el0t_64_sync+0x1b8/0x1c0
[  332.012731] ---[ end trace ]---
[  343.012613] ath12k_pci 0004:01:00.0: time out while waiting for get fw stats
[  349.028536] ath12k_pci 0004:01:00.0: time out while waiting for get fw stats
[  357.029486] ath12k_pci 0004:01:00.0: wmi command 90113 timeout
[  357.029498] ath12k_pci 0004:01:00.0: failed to send WMI_REQUEST_STATS cmd
[  357.029507] ath12k_pci 0004:01:00.0: failed to request fw stats: -11
[  363.044440] ath12k_pci 0004:01:00.0: wmi command 90113 timeout
[  363.044453] ath12k_pci 0004:01:00.0: failed to send WMI_REQUEST_STATS cmd
[  363.044462] ath12k_pci 0004:01:00.0: failed to request fw stats: -11
[  363.110529] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [qemu-system-arm:4800]
[  363.110532] Modules linked in: aes_ce_ccm michael_mic snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat bridge stp llc binfmt_misc nls_iso8859_1 qrtr_mhi ath12k mac80211 cfg80211 libarc4 mhi snd_soc_lpass_va_macro snd_soc_lpass_wsa_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_hdmi_codec snd_soc_lpass_macro_common snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pm8941_pwrkey qcom_spmi_temp_alarm snd_seq_device industrialio snd_timer snd pci_pwrctrl_pwrseq pci_pwrctrl_core soundcore qcom_edac leds_gpio input_leds joydev sch_fq_codel nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_masq nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt nvme nvme_core nvme_keyring nvme_auth hid_multitouch qcom_pon qrtr_smd reboot_mode nvmem_qcom_spmi_sdam rtc_pm8xxx hid_generic i2c_hid_of i2c_hid
[  363.110583]  qcom_pd_mapper rpmsg_ctrl hid ps883x phy_nxp_ptn3222 msm polyval_ce ghash_ce sm4_ce_gcm qcom_q6v5_pas sm4_ce_ccm qcom_pil_info qcom_spmi_pmic drm_exec qcom_stats sm4_ce phy_qcom_edp qcom_common dispcc_x1e80100 ocmem sm4_ce_cipher qcom_glink_smem gpu_sched sm4 pinctrl_sm8550_lpass_lpi ucsi_glink qcom_q6v5 qcom_sysmon sm3_ce videocc_sm8550 i2c_qcom_geni phy_snps_eusb2 pinctrl_lpass_lpi lpasscc_sc8280xp gpucc_x1e80100 mdt_loader tcsrcc_x1e80100 icc_bwmon qcom_cpucp_mbox typec_ucsi socinfo qcom_battmgr pwrseq_qcom_wcn sha3_ce sha1_ce pwrseq_core gpio_keys uio_pdrv_genirq fixed uio qrtr aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[  363.110616] CPU: 1 UID: 1000 PID: 4800 Comm: qemu-system-arm Kdump: loaded Tainted: G        W           6.16.0-rc6-jg-2-qcom-x1e #2 PREEMPT(voluntary) 
[  363.110619] Tainted: [W]=WARN
[  363.110620] Hardware name: Dell Inc. XPS 13 9345/0W8JXV, BIOS 2.8.0 04/30/2025
[  363.110621] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  363.110623] pc : kvm_arch_vcpu_ioctl_run+0x5bc/0x750
[  363.110629] lr : kvm_arch_vcpu_ioctl_run+0x274/0x750
[  363.110631] sp : ffff8000945bb8a0
[  363.110632] x29: ffff8000945bb8a0 x28: ffff00087f75a380 x27: 0000000000000000
[  363.110634] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[  363.110636] x23: ffff00087d90d000 x22: 0000000000000001 x21: ffff00087f75a380
[  363.110637] x20: 0000000000000000 x19: ffff000880b453a0 x18: ffff800092bed040
[  363.110639] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  363.110641] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  363.110642] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffcd743b0bcc1c
[  363.110644] x8 : ffff8000945bb870 x7 : 0000000000000000 x6 : 0000000000000000
[  363.110645] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[  363.110646] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[  363.110648] Call trace:
[  363.110649]  kvm_arch_vcpu_ioctl_run+0x5bc/0x750 (P)
[  363.110652]  kvm_vcpu_ioctl+0x1a0/0xbe0
[  363.110655]  __arm64_sys_ioctl+0xd0/0x160
[  363.110657]  invoke_syscall+0x70/0x120
[  363.110660]  el0_svc_common.constprop.0+0x4c/0x140
[  363.110663]  do_el0_svc+0x28/0x60
[  363.110665]  el0_svc+0x40/0x190
[  363.110667]  el0t_64_sync_handler+0x134/0x160
[  363.110668]  el0t_64_sync+0x1b8/0x1c0
[  381.189061] mhi mhi0: Requested to power ON
[  381.189078] mhi mhi0: Power on setup success
[  381.293336] mhi mhi0: Wait for device to enter SBL or Mission mode
[  381.682443] ath12k_pci 0004:01:00.0: chip_id 0x2 chip_family 0x4 board_id 0xff soc_id 0x40170200
[  381.682459] ath12k_pci 0004:01:00.0: fw_version 0x1108811c fw_build_timestamp 2025-05-17 00:21 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HMT.1.1.c5-00284.1-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
[  381.762936] ath12k_pci 0004:01:00.0: Last interrupt received for each CE:
[  381.762953] ath12k_pci 0004:01:00.0: CE_id 0 pipe_num 0 368036ms before
[  381.762963] ath12k_pci 0004:01:00.0: CE_id 1 pipe_num 1 321978ms before
[  381.762971] ath12k_pci 0004:01:00.0: CE_id 2 pipe_num 2 18652ms before
[  381.762978] ath12k_pci 0004:01:00.0: CE_id 3 pipe_num 3 33753ms before
[  381.762985] ath12k_pci 0004:01:00.0: CE_id 5 pipe_num 5 81758ms before
[  381.762992] ath12k_pci 0004:01:00.0: 
               Last interrupt received for each group:
[  381.762999] ath12k_pci 0004:01:00.0: group_id 0 58398ms before
[  381.763006] ath12k_pci 0004:01:00.0: group_id 1 81758ms before
[  381.763014] ath12k_pci 0004:01:00.0: group_id 2 66511ms before
[  381.763021] ath12k_pci 0004:01:00.0: group_id 3 39277ms before
[  381.763028] ath12k_pci 0004:01:00.0: group_id 4 119495ms before
[  381.763035] ath12k_pci 0004:01:00.0: group_id 5 125436ms before
[  381.763043] ath12k_pci 0004:01:00.0: group_id 6 125437ms before
[  381.763050] ath12k_pci 0004:01:00.0: group_id 7 81758ms before
[  381.763057] ath12k_pci 0004:01:00.0: group_id 8 81758ms before
[  381.763064] ath12k_pci 0004:01:00.0: group_id 9 81758ms before
[  381.763071] ath12k_pci 0004:01:00.0: group_id 10 81758ms before
[  381.763080] ath12k_pci 0004:01:00.0: dst srng id 0 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 58398ms
[  381.763090] ath12k_pci 0004:01:00.0: dst srng id 1 tp 3664, cur hp 3664, cached hp 3664 last hp 3664 napi processed before 39277ms
[  381.763100] ath12k_pci 0004:01:00.0: dst srng id 2 tp 1592, cur hp 1592, cached hp 1592 last hp 1592 napi processed before 119495ms
[  381.763108] ath12k_pci 0004:01:00.0: dst srng id 3 tp 104, cur hp 104, cached hp 104 last hp 104 napi processed before 125436ms
[  381.763117] ath12k_pci 0004:01:00.0: dst srng id 4 tp 56, cur hp 56, cached hp 56 last hp 56 napi processed before 125437ms
[  381.763125] ath12k_pci 0004:01:00.0: dst srng id 5 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 81758ms
[  381.763134] ath12k_pci 0004:01:00.0: dst srng id 6 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 81758ms
[  381.763141] ath12k_pci 0004:01:00.0: dst srng id 7 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 81758ms
[  381.763149] ath12k_pci 0004:01:00.0: dst srng id 8 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 81758ms
[  381.763157] ath12k_pci 0004:01:00.0: src srng id 16 hp 0, reap_hp 248, cur tp 0, cached tp 0 last tp 0 napi processed before 81758ms
[  381.763166] ath12k_pci 0004:01:00.0: src srng id 20 hp 444, reap_hp 444, cur tp 240, cached tp 240 last tp 240 napi processed before 18185ms
[  381.763175] ath12k_pci 0004:01:00.0: dst srng id 21 tp 560, cur hp 560, cached hp 560 last hp 560 napi processed before 58398ms
[  381.763184] ath12k_pci 0004:01:00.0: src srng id 24 hp 104, reap_hp 104, cur tp 104, cached tp 96 last tp 96 napi processed before 58398ms
[  381.763193] ath12k_pci 0004:01:00.0: src srng id 25 hp 0, reap_hp 4088, cur tp 0, cached tp 0 last tp 0 napi processed before 81758ms
[  381.763201] ath12k_pci 0004:01:00.0: src srng id 26 hp 1592, reap_hp 1592, cur tp 1584, cached tp 1584 last tp 1584 napi processed before 18717ms
[  381.763212] ath12k_pci 0004:01:00.0: src srng id 64 hp 12, reap_hp 8, cur tp 12, cached tp 12 last tp 8 napi processed before 368036ms
[  381.763221] ath12k_pci 0004:01:00.0: src srng id 67 hp 108, reap_hp 104, cur tp 108, cached tp 108 last tp 104 napi processed before 33753ms
[  381.763245] ath12k_pci 0004:01:00.0: src srng id 68 hp 44, reap_hp 40, cur tp 44, cached tp 44 last tp 36 napi processed before 329566ms
[  381.763254] ath12k_pci 0004:01:00.0: src srng id 82 hp 2, reap_hp 2, cur tp 6, cached tp 6 last tp 6 napi processed before 321979ms
[  381.763263] ath12k_pci 0004:01:00.0: src srng id 83 hp 86, reap_hp 86, cur tp 92, cached tp 92 last tp 92 napi processed before 18653ms
[  381.763271] ath12k_pci 0004:01:00.0: dst srng id 101 tp 12, cur hp 12, cached hp 12 last hp 12 napi processed before 321979ms
[  381.763280] ath12k_pci 0004:01:00.0: dst srng id 102 tp 180, cur hp 180, cached hp 180 last hp 180 napi processed before 18653ms
[  381.763289] ath12k_pci 0004:01:00.0: src srng id 120 hp 65534, reap_hp 65534, cur tp 0, cached tp 0 last tp 0 napi processed before 368044ms
[  381.763300] ath12k_pci 0004:01:00.0: src srng id 121 hp 0, reap_hp 504, cur tp 0, cached tp 0 last tp 0 napi processed before 81759ms
[  381.763309] ath12k_pci 0004:01:00.0: dst srng id 128 tp 104, cur hp 104, cached hp 104 last hp 104 napi processed before 58399ms
[  381.763318] ath12k_pci 0004:01:00.0: dst srng id 130 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 81759ms
[  381.763326] ath12k_pci 0004:01:00.0: dst srng id 131 tp 16, cur hp 16, cached hp 16 last hp 16 napi processed before 58399ms
[  381.763335] ath12k_pci 0004:01:00.0: dst srng id 132 tp 1584, cur hp 1584, cached hp 1584 last hp 1584 napi processed before 66512ms
[  381.763345] ath12k_pci 0004:01:00.0: src srng id 160 hp 1356, reap_hp 1356, cur tp 1356, cached tp 1354 last tp 1354 napi processed before 39278ms
[  381.763354] ath12k_pci 0004:01:00.0: src srng id 161 hp 0, reap_hp 4094, cur tp 0, cached tp 0 last tp 0 napi processed before 81759ms
[  381.763363] ath12k_pci 0004:01:00.0: src srng id 169 hp 0, reap_hp 4094, cur tp 0, cached tp 0 last tp 0 napi processed before 81759ms
[  381.763372] ath12k_pci 0004:01:00.0: src srng id 185 hp 2046, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 368038ms
[  381.763381] ath12k_pci 0004:01:00.0: dst srng id 186 tp 0, cur hp 0, cached hp 0 last hp 0 napi processed before 81759ms
[  381.763389] ath12k_pci 0004:01:00.0: src srng id 193 hp 2046, reap_hp 2046, cur tp 0, cached tp 0 last tp 0 napi processed before 368038ms
[  381.941884] ieee80211 phy0: Hardware restart was requested
[  395.813734] ath12k_pci 0004:01:00.0: pdev 0 successfully recovered
[  422.236199] [drm:dpu_encoder_frame_done_timeout:2715] [dpu error]enc38 frame done timeout
[  423.483590] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.500296] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.516966] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.533654] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.550343] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.567052] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.583719] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.600408] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.617096] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.633784] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  423.988151] [drm:dpu_encoder_frame_done_timeout:2715] [dpu error]enc38 frame done timeout
[  425.701143] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1: hangcheck detected gpu lockup rb 0!
[  425.701317] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     completed fence: 7787
[  425.701408] msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 67.5.12.1:     submitted fence: 7788
[  425.866654] msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 67.5.12.1: hangcheck recover!
[  436.951118] workqueue: bpf_prog_free_deferred hogged CPU for >1000000us 4 times, consider switching to WQ_UNBOUND
[  451.726254] wlP4p1s0: authenticate with 30:68:93:38:58:65 (local address=3c:0a:f3:2e:1c:a9)
[  451.726273] wlP4p1s0: send auth to 30:68:93:38:58:65 (try 1/3)
[  451.740288] wlP4p1s0: authenticated
[  451.742270] wlP4p1s0: associate with 30:68:93:38:58:65 (try 1/3)
[  451.769510] wlP4p1s0: RX AssocResp from 30:68:93:38:58:65 (capab=0x1511 status=0 aid=6)
[  451.788849] wlP4p1s0: associated
[  452.120847] wlP4p1s0: Limiting TX power to 30 (30 - 0) dBm as advertised by 30:68:93:38:58:65
[  453.689300] dpu_crtc_frame_event_cb: 4 callbacks suppressed
[  453.689304] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.705994] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.722675] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.739381] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.756164] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.772740] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.789446] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.806116] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.822823] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  453.839491] [drm:dpu_crtc_frame_event_cb [msm]] *ERROR* crtc106 event 1 overflow
[  454.614117] workqueue: bpf_prog_free_deferred hogged CPU for >1000000us 5 times, consider switching to WQ_UNBOUND

McFacePunch avatar Sep 09 '25 02:09 McFacePunch

Ok so enabling the kvm kills some essential I/O and probably MMIO access. Can't observe it here, though

jglathe avatar Sep 09 '25 13:09 jglathe

What do you think would be some helpful logs to dump here? I didn't see much else of value looking but maybe another pair of eyes would be good.

Paul-Mysten avatar Sep 20 '25 13:09 Paul-Mysten

I'm still a bit at a loss here. Do you have the newest BIOS? Some data like hwinfo64 (windows) log? Odd firmware appears to be the most likely, though.

jglathe avatar Sep 20 '25 16:09 jglathe

I'm pretty sure I have the stock bios and firmware because of a warning from the firmware puller utility that some of the newer firmware doesn't work due to.... I forget the reason.

I could boot into windows for the hwinfo64 stuff yeah. Ah yes from the prior, Hardware name: Dell Inc. XPS 13 9345/0W8JXV, BIOS 2.8.0 04/30/2025

Paul-Mysten avatar Sep 22 '25 13:09 Paul-Mysten