[Build 2023122703-4.2] sys-usb panics on Framework 13 AMD, with 6.1.62 kernel
Observation
openQA test in scenario qubesos-4.2-kernel-x86_64-install_default_upload@hw12 fails in firstboot
[2023-12-27 12:36:02] [ 2.430791] thunderbolt 0000:00:0b.0: Failed to create MSI-X! ret=-19!
[2023-12-27 12:36:02] [ 2.430803] list_del corruption, ffff99605255c0c0->next is LIST_POISON1 (dead000000000100)
[2023-12-27 12:36:02] [ 2.430815] ------------[ cut here ]------------
[2023-12-27 12:36:02] [ 2.430821] kernel BUG at lib/list_debug.c:53!
[2023-12-27 12:36:02] [ 2.430829] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[2023-12-27 12:36:02] [ 2.430835] CPU: 1 PID: 296 Comm: (udev-worker) Not tainted 6.1.62-1.qubes.fc37.x86_64 #1
[2023-12-27 12:36:02] [ 2.430843] Hardware name: Xen HVM domU, BIOS 4.17.2 12/10/2023
[2023-12-27 12:36:02] [ 2.430849] RIP: 0010:__list_del_entry_valid.cold+0x5c/0x6f
[2023-12-27 12:36:02] [ 2.430859] Code: e8 1d 7a fd ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 98 60 9e bb e8 09 7a fd ff 0f 0b 48 89 fe 48 c7 c7 60 60 9e bb e8 f8 79 fd ff <0f> 0b 48 89 fe 48 c7 c7 30 60 9e bb e8 e7 79 fd ff 0f 0b 48 8b 04
[2023-12-27 12:36:02] [ 2.430875] RSP: 0018:ffffb8bc402e3a40 EFLAGS: 00010246
[2023-12-27 12:36:02] [ 2.430882] RAX: 000000000000004e RBX: ffff99605255c0c0 RCX: 0000000000000000
[2023-12-27 12:36:02] [ 2.430889] RDX: 0000000000000000 RSI: ffffffffbb9cb271 RDI: 00000000ffffffff
[2023-12-27 12:36:02] [ 2.430896] RBP: ffff99605255c0c0 R08: 0000000000000000 R09: ffffb8bc402e38e0
[2023-12-27 12:36:02] [ 2.430903] R10: 0000000000000003 R11: ffffffffbbd471c8 R12: ffff99605255c0d0
[2023-12-27 12:36:02] [ 2.430910] R13: ffffb8bc40275000 R14: ffff996041c39000 R15: 0000000000000010
[2023-12-27 12:36:02] [ 2.430918] FS: 000072e524a27940(0000) GS:ffff996057100000(0000) knlGS:0000000000000000
[2023-12-27 12:36:02] [ 2.430926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2023-12-27 12:36:02] [ 2.430932] CR2: 00005f8d83b9d000 CR3: 000000000d154000 CR4: 0000000000750ee0
[2023-12-27 12:36:02] [ 2.430940] PKRU: 55555554
[2023-12-27 12:36:02] [ 2.430943] Call Trace:
[2023-12-27 12:36:02] [ 2.430947] <TASK>
[2023-12-27 12:36:02] [ 2.430951] ? show_trace_log_lvl+0x1d3/0x2ef
[2023-12-27 12:36:02] [ 2.430958] ? show_trace_log_lvl+0x1d3/0x2ef
[2023-12-27 12:36:02] [ 2.430964] ? show_trace_log_lvl+0x1d3/0x2ef
[2023-12-27 12:36:02] [ 2.430970] ? xen_free_irq+0x92/0x100
[2023-12-27 12:36:02] [ 2.430976] ? __die_body.cold+0x8/0xd
[2023-12-27 12:36:02] [ 2.430981] ? die+0x2a/0x50
[2023-12-27 12:36:02] [ 2.430987] ? do_trap+0xc5/0x110
[2023-12-27 12:36:02] [ 2.430992] ? __list_del_entry_valid.cold+0x5c/0x6f
[2023-12-27 12:36:02] [ 2.430998] ? do_error_trap+0x6a/0x90
[2023-12-27 12:36:02] [ 2.431002] ? __list_del_entry_valid.cold+0x5c/0x6f
[2023-12-27 12:36:02] [ 2.431008] ? exc_invalid_op+0x4c/0x60
[2023-12-27 12:36:02] [ 2.431015] ? __list_del_entry_valid.cold+0x5c/0x6f
[2023-12-27 12:36:02] [ 2.431020] ? asm_exc_invalid_op+0x16/0x20
[2023-12-27 12:36:02] [ 2.431028] ? __list_del_entry_valid.cold+0x5c/0x6f
[2023-12-27 12:36:02] [ 2.431034] xen_free_irq+0x92/0x100
[2023-12-27 12:36:02] [ 2.431039] xen_destroy_irq+0x6c/0x120
[2023-12-27 12:36:02] [ 2.431044] xen_teardown_msi_irqs+0x3b/0x70
[2023-12-27 12:36:02] [ 2.431051] msi_domain_free_irqs_descs_locked+0x1b/0x40
[2023-12-27 12:36:02] [ 2.431058] pci_msi_teardown_msi_irqs+0x3e/0x40
[2023-12-27 12:36:02] [ 2.431065] __pci_enable_msix_range.part.0+0x2d4/0x490
[2023-12-27 12:36:02] [ 2.431072] pci_alloc_irq_vectors_affinity+0xa9/0x110
[2023-12-27 12:36:02] [ 2.431079] nhi_probe+0x1bf/0x510 [thunderbolt]
[2023-12-27 12:36:02] [ 2.431096] local_pci_probe+0x41/0x80
[2023-12-27 12:36:02] [ 2.431101] pci_call_probe+0x54/0x160
[2023-12-27 12:36:02] [ 2.431106] pci_device_probe+0x7c/0x100
[2023-12-27 12:36:02] [ 2.431113] ? driver_sysfs_add+0x71/0xd0
[2023-12-27 12:36:02] [ 2.431118] really_probe+0xde/0x380
[2023-12-27 12:36:02] [ 2.431123] ? pm_runtime_barrier+0x50/0x90
[2023-12-27 12:36:02] [ 2.431128] __driver_probe_device+0x78/0x120
[2023-12-27 12:36:02] [ 2.431134] driver_probe_device+0x1f/0x90
[2023-12-27 12:36:02] [ 2.431138] __driver_attach+0xce/0x1c0
[2023-12-27 12:36:02] [ 2.431143] ? __device_attach_driver+0x110/0x110
[2023-12-27 12:36:02] [ 2.431148] bus_for_each_dev+0x87/0xd0
[2023-12-27 12:36:02] [ 2.431154] bus_add_driver+0x1ae/0x200
[2023-12-27 12:36:02] [ 2.431159] driver_register+0x89/0xe0
[2023-12-27 12:36:02] [ 2.431163] nhi_init+0x5c/0x1000 [thunderbolt]
[2023-12-27 12:36:02] [ 2.431177] ? 0xffffffffc074b000
[2023-12-27 12:36:02] [ 2.431181] do_one_initcall+0x59/0x230
[2023-12-27 12:36:02] [ 2.431188] do_init_module+0x4a/0x1f0
[2023-12-27 12:36:02] [ 2.431194] __do_sys_finit_module+0xac/0x120
[2023-12-27 12:36:02] [ 2.431200] do_syscall_64+0x5b/0x80
[2023-12-27 12:36:02] [ 2.431206] ? do_syscall_64+0x67/0x80
[2023-12-27 12:36:02] [ 2.431210] entry_SYSCALL_64_after_hwframe+0x64/0xce
[2023-12-27 12:36:02] [ 2.431217] RIP: 0033:0x72e524d2cb4d
[2023-12-27 12:36:02] [ 2.431222] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b 92 0c 00 f7 d8 64 89 01 48
[2023-12-27 12:36:02] [ 2.431237] RSP: 002b:00007ffc467850c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[2023-12-27 12:36:02] [ 2.431245] RAX: ffffffffffffffda RBX: 00005f8d8390ce00 RCX: 000072e524d2cb4d
[2023-12-27 12:36:02] [ 2.431252] RDX: 0000000000000000 RSI: 000072e52524307d RDI: 0000000000000006
[2023-12-27 12:36:02] [ 2.431260] RBP: 00007ffc46785180 R08: 0000000000000000 R09: 00007ffc46785130
[2023-12-27 12:36:02] [ 2.431266] R10: 0000000000000006 R11: 0000000000000246 R12: 000072e52524307d
[2023-12-27 12:36:02] [ 2.431273] R13: 0000000000020000 R14: 00005f8d833ddb10 R15: 00005f8d833dda00
[2023-12-27 12:36:02] [ 2.431281] </TASK>
[2023-12-27 12:36:02] [ 2.431284] Modules linked in: intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 pcspkr thunderbolt(+) drm_vram_helper drm_ttm_helper ttm xhci_pci xhci_pci_renesas ehci_pci ata_generic xhci_hcd serio_raw i2c_piix4 ehci_hcd pata_acpi xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse loop overlay xen_blkfront
[2023-12-27 12:36:02] [ 2.431335] ---[ end trace 0000000000000000 ]---
[2023-12-27 12:36:02] [ 2.431340] RIP: 0010:__list_del_entry_valid.cold+0x5c/0x6f
[2023-12-27 12:36:02] [ 2.431347] Code: e8 1d 7a fd ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 98 60 9e bb e8 09 7a fd ff 0f 0b 48 89 fe 48 c7 c7 60 60 9e bb e8 f8 79 fd ff <0f> 0b 48 89 fe 48 c7 c7 30 60 9e bb e8 e7 79 fd ff 0f 0b 48 8b 04
[2023-12-27 12:36:02] [ 2.431362] RSP: 0018:ffffb8bc402e3a40 EFLAGS: 00010246
[2023-12-27 12:36:02] [ 2.431368] RAX: 000000000000004e RBX: ffff99605255c0c0 RCX: 0000000000000000
[2023-12-27 12:36:02] [ 2.431375] RDX: 0000000000000000 RSI: ffffffffbb9cb271 RDI: 00000000ffffffff
[2023-12-27 12:36:02] [ 2.431381] RBP: ffff99605255c0c0 R08: 0000000000000000 R09: ffffb8bc402e38e0
[2023-12-27 12:36:02] [ 2.431388] R10: 0000000000000003 R11: ffffffffbbd471c8 R12: ffff99605255c0d0
[2023-12-27 12:36:02] [ 2.431396] R13: ffffb8bc40275000 R14: ffff99[2023-12-27 12:36:02] 6041c39000 R15: 0000000000000010
[2023-12-27 12:36:02] [ 2.431404] FS: 000072e524a27940(0000) GS:ffff996057100000(0000) knlGS:0000000000000000
[2023-12-27 12:36:02] [ 2.431411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2023-12-27 12:36:02] [ 2.431417] CR2: 00005f8d83b9d000 CR3: 000000000d154000 CR4: 0000000000750ee0
[2023-12-27 12:36:02] [ 2.431424] PKRU: 55555554
[2023-12-27 12:36:02] [ 2.431426] Kernel panic - not syncing: Fatal exception
[2023-12-27 12:36:02] [ 2.431455] Kernel Offset: 0x39000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Test suite description
Default installation, preserve disk for further tests
Reproducible
Fails since (at least) Build 2023120510-4.2
Expected result
Last good: (unknown) (or more recent)
Further details
Always latest result in this scenario: latest
And with kernel-latest, the whole host (dom0?) crashes when starting sys-net/sys-usb: https://openqa.qubes-os.org/tests/89014 No console log, sadly.
It hard reboots here if something is connected to the USB4/Thunderbolt port when sys-usb starts. Starting without any connected device works.
Still there with kernel 6.9.4-1.qubes.fc37.x86_64
And rebooting sys-usb (with or without anything connected) does the same. Maybe somewhat related to passthrough/FLR For every USB controller I see in dmesg :
pciback 0000:c3:00.3: xen-pciback: Driver tried to write to a read-only configuration space field at offset 0x2a6, size 2. This may be harmless, but if you have problems with your device:
1) see permissive attribute in sysfs
2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.
I 'm experiencing with no-strict-reset=True|False on USB4 controllers but this does not seems to help a lot.