BUG ConnectX-4 Lx网卡在开机或重启时,有概率会掉网卡
请填写以下信息.
Please fill in the following information.
Install ENV: (You can find it in the boot interface.)
- DMI: qemu
- CPU:
- NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
RR version: (You can find it in the update menu.)
- RR: 24.9.1
- addons:
- modules:
- lkms:
DSM:
- model: DS920+
- version: 7.2.2
Issue:
ConnectX-4 Lx网卡在开机或重启时,有概率会初始化失败,无法创建eth导致失联,其他型号试过sa6400也一样
在RR阶段是无问题的,进度到dsm内核才会出现概率掉卡
logs:
SynologyNAS> [ 126.411146] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource [ 126.414546] mlx5_core 0000:00:11.0: 0000:00:11.0:page_notify_fail:308:(pid 3851): page notify failed [ 126.415060] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 4999): ALLOC_UAR(0x802) timeout. Will cause a leak of a command resource [ 126.415062] mlx5_core 0000:00:11.0: 0000:00:11.0:mlx5_alloc_map_uar:237:(pid 4999): mlx5_cmd_alloc_uar() failed, -110 [ 126.415064] mlx5_core 0000:00:11.0: 0000:00:11.0:mlx5e_create_netdev:2141:(pid 4999): alloc_map uar failed, -110 [ 126.415203] udevd[4999]: failed to send result of seq 1835 to main daemon: Connection refused [ 126.426965] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): give fail -110 ^C SynologyNAS> [ 186.428088] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource [ 186.431379] mlx5_core 0000:00:11.0: 0000:00:11.0:reclaim_pages:407:(pid 3851): failed reclaiming pages [ 186.433866] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): reclaim fail -110 [ 246.436094] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource [ 246.439622] mlx5_core 0000:00:11.0: 0000:00:11.0:reclaim_pages:407:(pid 3851): failed reclaiming pages [ 246.442351] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): reclaim fail -110 [ 306.445143] mlx5_core 0000:00:11.0: 0000:00:11.0:wait_func:790:(pid 3851): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource [ 306.447718] mlx5_core 0000:00:11.0: 0000:00:11.0:reclaim_pages:407:(pid 3851): failed reclaiming pages [ 306.449502] mlx5_core 0000:00:11.0: 0000:00:11.0:pages_work_handler:443:(pid 3851): reclaim fail -110
00:11.0 Class 0200: Device 15b3:1015 Subsystem: Device 15b3:0069 Flags: bus master, fast devsel, latency 0, IRQ 10 Memory at 7030000000 (64-bit, prefetchable) [size=32M] Expansion ROM at c1600000 [disabled] [size=1M] Capabilities: [60] Express Endpoint, IntMsgNum 0 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Kernel driver in use: mlx5_core
(## 因为 log中存在 SN/MAC 等一些敏感信息, 当提供完整文件时请自行抹除他们, 当然你也可以发送到我的邮箱. ##)
(## Because the log contains some sensitive information such as SN/MAC, please delete them when providing the complete file. Of course, you can also send it to my email. ##)
...
(请先看一下#173、#175、#226的内容)
(Plz review the content of #173, #175, #226 first)
...
(如果你只是说 XXX 不能用, 什么详细信息也不提供, 我也只能说感谢你的反馈.)
(If you just say XXX doesn't work without providing any details, I can only say thank you for your feedback.)
...
test v24.12.1