BUG: scheduling while atomic with latest dev branch
The current version of the dev driver exhibits this bug. We receive this frequently, which ends up crashing the kernel:
BUG: scheduling while atomic: wpa_supplicant/903/0x00000202
Modules linked in: wilc_sdio(O) cfg80211 rfkill nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_recent xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vf610_adc evbug
Preemption disabled at:[< (null)>] (null)
CPU: 0 PID: 903 Comm: wpa_supplicant Tainted: G O 4.1.36-fslc+g2fa70a9 #4
Hardware name: Freescale i.MX6 xxxxx (Device Tree)
[<80015b74>] (unwind_backtrace) from [<8001273c>] (show_stack+0x10/0x14)
[<8001273c>] (show_stack) from [<8057b344>] (dump_stack+0x88/0x9c)
[<8057b344>] (dump_stack) from [<800507b8>] (__schedule_bug+0x6c/0xc0)
[<800507b8>] (__schedule_bug) from [<8057c618>] (__schedule+0x400/0x50c)
[<8057c618>] (__schedule) from [<8057c764>] (schedule+0x40/0x98)
[<8057c764>] (schedule) from [<8057eefc>] (schedule_timeout+0x10c/0x16c)
[<8057eefc>] (schedule_timeout) from [<8057d1d8>] (wait_for_common+0x8c/0x138)
[<8057d1d8>] (wait_for_common) from [<7f0d0248>] (cfg_set+0xd8/0x144 [wilc_sdio])
[<7f0d0248>] (cfg_set [wilc_sdio]) from [<7f0d04ac>] (wilc_send_config_pkt+0xc8/0x258 [wilc_sdio])
[<7f0d04ac>] (wilc_send_config_pkt [wilc_sdio]) from [<7f0cd930>] (wilc_frame_register+0x8c/0x128 [wilc_sdio])
[<7f0cd930>] (wilc_frame_register [wilc_sdio]) from [<7f09d4ac>] (cfg80211_mlme_unregister_socket+0x90/0x118 [cfg80211])
[<7f09d4ac>] (cfg80211_mlme_unregister_socket [cfg80211]) from [<7f095c40>] (nl80211_netlink_notify+0xb8/0x280 [cfg80211])
[<7f095c40>] (nl80211_netlink_notify [cfg80211]) from [<8004d38c>] (notifier_call_chain+0x44/0x84)
[<8004d38c>] (notifier_call_chain) from [<8004d610>] (atomic_notifier_call_chain+0x34/0x48)
[<8004d610>] (atomic_notifier_call_chain) from [<804aa418>] (netlink_release+0x404/0x4e8)
[<804aa418>] (netlink_release) from [<804691e4>] (sock_release+0x20/0xa8)
[<804691e4>] (sock_release) from [<80469278>] (sock_close+0xc/0x14)
[<80469278>] (sock_close) from [<800e8f04>] (__fput+0x80/0x1c8)
[<800e8f04>] (__fput) from [<8004b208>] (task_work_run+0xac/0xe4)
[<8004b208>] (task_work_run) from [<8001236c>] (do_work_pending+0x7c/0xa4)
[<8001236c>] (do_work_pending) from [<8000f50c>] (work_pending+0xc/0x20)
Testing different commits seems to show commit f9b8665260bd7fed92b292d7e1d1f7a0a0f2053b to be the cause.
The kernel warning seems to imply that cfg_set is being called in an atomic context, where the wait_for_completion_timeout call is not allowed.
Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(), so it can only be called safely when you know that interrupts are enabled. Calling it from IRQs-off atomic contexts will result in hard-to-detect spurious enabling of interrupts. (https://www.kernel.org/doc/Documentation/scheduler/completion.txt)
Seems like the call to wait_for_completion_timeout on line 1795 of wilc_wlan.c seems to possibly be the issue. I'm not familiar with the driver enough to know how to solve right now though.
platform is i.MX6 running kernel 4.1.
Do you know how to trigger it? I don't think I've seen that one before.
Very easily. This is kernel 4.1,
Using this yocto recipe and the 0b5df6c6353e8258c9293610029ebb522d87001c firmware.
LICENSE = "GPLv2"
LIC_FILES_CHKSUM = "file://wilc_sdio.c;endline=5;md5=b9fc88c1971d6e0f359cc1b84416d289"
# dev branch code 15.3
SRCREV = "e805df3133664dc99480160da21c70110a5c65bd"
SRC_URI = "git://github.com/linux4wilc/driver.git;protocol=http;branch=dev"
S = "${WORKDIR}/git/wilc"
inherit module
RDEPENDS_${PN} = "linux-firmware-mchp-wilc1000 (= ${EXTENDPKGV})"
EXTRA_OEMAKE = '-C ${STAGING_KERNEL_DIR} M=${S} \
CONFIG_WILC=m \
CONFIG_WILC_SDIO=m \
BUS=SDIO \
'
do_configure_prepend() {
#powersave seems to cause issues
sed -i 's/MIN_FAST_PS/NO_POWERSAVE/' ${S}/host_interface.c
#remove hardcoded interface name length
sed -i 's/IFC_0, 5/IFC_0, strlen(IFC_0)/g' ${S}/wilc_netdev.c
sed -i 's/IFC_1, 4/IFC_1, strlen(IFC_1)/g' ${S}/wilc_netdev.c
# rename p2p interface
#sed -i 's/p2p0/wlan1/' ${S}/host_interface.h
#sed -i 's/p2p%d/wlan%d/' ${S}/wilc_netdev.c
# turn down verbosity
sed -i 's/-DWILC_DEBUGFS//' ${S}/Makefile
sed -i 's/.*INIT(INIT.*/atomic_t WILC_DEBUG_REGION = ATOMIC_INIT(0);/' ${S}/wilc_debugfs.c
sed -i '/.*CFG80211_DBG.*/d' ${S}/wilc_debugfs.c
sed -i '/.*PWRDEV_DBG.*/d' ${S}/wilc_debugfs.c
}
do_install() {
install -d ${D}${base_libdir}/modules/${KERNEL_VERSION}/kernel/drivers/net/wireless/mchp/wilc
install -m 0644 ${S}/*${KERNEL_OBJECT_SUFFIX} ${D}${base_libdir}/modules/${KERNEL_VERSION}/kernel/drivers/net/wireless/mchp/wilc
}
Then just enter a valid network into the wpa_supplicant.conf, and bring up the interface and wpa_supplicant:
root@myhostname:~# uname -a
Linux myhostname 4.1.36-fslc+g2fa70a90615b #1 SMP PREEMPT Sat May 18 01:03:42 CDT 2019 armv7l GNU/Linux
root@myhostname:~# lsmod | grep -i wilc
root@myhostname:~# modulemanager.sh wifi start
mmc0: queuing unknown CIS tuple 0x91 (2 bytes)
mmc0: new high speed SDIO card at address 0001
Initializing Locks ...
wifi_pm : 0
wilc_sdio mmc0:0001:1: succesfully got gpio_reset
wilc_sdio mmc0:0001:1: succesfully got gpio_chip_en
wifi_pm : 1
wilc_sdio mmc0:0001:1: succesfully got gpio_reset
wilc_sdio mmc0:0001:1: succesfully got gpio_chip_en
wilc_sdio mmc0:0001:1: Driver Initializing success
De-Initializing Locks
Module_exit Done.
at_pwr_dev: deinit
at_pwr_dev: unregistered
mmc0: card 0001 removed
mmc0: queuing unknown CIS tuple 0x91 (2 bytes)
mmc0: new high speed SDIO card at address 0001
/usr/sbin/modulemanager.sh: wifi start: Action completed
root@myhostname:~# Initializing Locks ...
wilc_sdio mmc0:0001:1: Driver Initializing success
root@myhostname:~# ifconfig -a
xxx
p2p0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
xxx
wlan0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
root@myhostname:~# wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant.conf
Successfully initialized wpa_supplicant
WILC POWER UP
wilc_sdio mmc0:0001:1: SDIO speed: 50000000
wilc_sdio mmc0:0001:1: chipid 001003a0
wilc_sdio mmc0:0001:1: got gpio_irq successfully
root@myhostname:~# BUG: scheduling while atomic: wpa_supplicant/5330/0x00000202
Modules linked in: wilc_sdio(O) cfg80211 rfkill nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_recent xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vf610_adc evbug
Preemption disabled at:[< (null)>] (null)
CPU: 0 PID: 5330 Comm: wpa_supplicant Tainted: G O 4.1.36-fslc+g2fa70a90615b #1
Hardware name: Freescale i.MX6 xxx (Device Tree)
[<80015b74>] (unwind_backtrace) from [<8001273c>] (show_stack+0x10/0x14)
[<8001273c>] (show_stack) from [<8057b344>] (dump_stack+0x88/0x9c)
[<8057b344>] (dump_stack) from [<800507b8>] (__schedule_bug+0x6c/0xc0)
[<800507b8>] (__schedule_bug) from [<8057c618>] (__schedule+0x400/0x50c)
[<8057c618>] (__schedule) from [<8057c764>] (schedule+0x40/0x98)
[<8057c764>] (schedule) from [<8057eefc>] (schedule_timeout+0x10c/0x16c)
[<8057eefc>] (schedule_timeout) from [<8057d1d8>] (wait_for_common+0x8c/0x138)
[<8057d1d8>] (wait_for_common) from [<7f0d0268>] (cfg_set+0xd8/0x15c [wilc_sdio])
[<7f0d0268>] (cfg_set [wilc_sdio]) from [<7f0d04f8>] (wilc_send_config_pkt+0xc8/0x258 [wilc_sdio])
[<7f0d04f8>] (wilc_send_config_pkt [wilc_sdio]) from [<7f0cd958>] (wilc_frame_register+0x8c/0x128 [wilc_sdio])
[<7f0cd958>] (wilc_frame_register [wilc_sdio]) from [<7f09d4ac>] (cfg80211_mlme_unregister_socket+0x90/0x118 [cfg80211])
[<7f09d4ac>] (cfg80211_mlme_unregister_socket [cfg80211]) from [<7f095c40>] (nl80211_netlink_notify+0xb8/0x280 [cfg80211])
[<7f095c40>] (nl80211_netlink_notify [cfg80211]) from [<8004d38c>] (notifier_call_chain+0x44/0x84)
[<8004d38c>] (notifier_call_chain) from [<8004d610>] (atomic_notifier_call_chain+0x34/0x48)
[<8004d610>] (atomic_notifier_call_chain) from [<804aa418>] (netlink_release+0x404/0x4e8)
[<804aa418>] (netlink_release) from [<804691e4>] (sock_release+0x20/0xa8)
[<804691e4>] (sock_release) from [<80469278>] (sock_close+0xc/0x14)
[<80469278>] (sock_close) from [<800e8f04>] (__fput+0x80/0x1c8)
[<800e8f04>] (__fput) from [<8004b208>] (task_work_run+0xac/0xe4)
[<8004b208>] (task_work_run) from [<8001236c>] (do_work_pending+0x7c/0xa4)
[<8001236c>] (do_work_pending) from [<8000f50c>] (work_pending+0xc/0x20)
It looks like the type of context for cfg's calls change with the kernel version, or maybe the level of warning the kernel is configured to. I can't see that on sama5d4 with 4.9 and 4.19 kernels neither.
The change was actually done based on the linux wireless subsystem feedback on the driver https://www.spinics.net/lists/linux-wireless/msg179293.html, recommending to do the actual work from the cfg context. I'll see if the wait for completion can be changed to avoid sleeping.
I'm still surprised though on why do the frame_register is being called multiple times
@AdhamAbozaeid thank you, let me know what you come up with. We are very interested in integrating the latest fixes since 15.2.1 but this is blocking us.
@deanoati If this is affecting your production schedule, please open a salesforce ticket to prioritize this issue
I have the same problem with kernel 4.1, driver & firmware version 15_3_1
Then just enter a valid network into the wpa_supplicant.conf, and bring up the interface and wpa_supplicant: i have this problem always with and without valid network
modprobe wilc-sdio.ko ifconfig wlan0 up wpa_supplicant -i wlan0 -c /etc/wpa_supplicant.conf -B
[ 65.004194] BUG: scheduling while atomic: wpa_supplicant/295/0x00000202 [ 65.010855] Modules linked in: wilc_sdio(O) cfg80211 imx_rpmsg_tty ho740_uart(O) gpib(O) ts_ssd254x(O) bt_leds(O) galcore(O) [ 65.010888] Preemption disabled at:[< (null)>] (null) [ 65.016243] [ 65.016257] CPU: 0 PID: 295 Comm: wpa_supplicant Tainted: G O 4.1.15 #11 [ 65.016264] Hardware name: Freescale i.MX6 SoloX (Device Tree) [ 65.016297] [<80015b74>] (unwind_backtrace) from [<800126c4>] (show_stack+0x10/0x14) [ 65.016313] [<800126c4>] (show_stack) from [<804d5844>] (dump_stack+0x74/0xb4) [ 65.016328] [<804d5844>] (dump_stack) from [<8004a318>] (__schedule_bug+0x6c/0xbc) [ 65.016340] [<8004a318>] (__schedule_bug) from [<804d65fc>] (__schedule+0x48c/0x544) [ 65.016350] [<804d65fc>] (__schedule) from [<804d66f4>] (schedule+0x40/0x98) [ 65.016362] [<804d66f4>] (schedule) from [<804d90a0>] (schedule_timeout+0x118/0x178) [ 65.016376] [<804d90a0>] (schedule_timeout) from [<804d72b0>] (wait_for_common+0x90/0x148) [ 65.016457] [<804d72b0>] (wait_for_common) from [<7f0e9188>] (cfg_set+0xd8/0x154 [wilc_sdio]) [ 65.016549] [<7f0e9188>] (cfg_set [wilc_sdio]) from [<7f0e93dc>] (wilc_send_config_pkt+0xd4/0x25c [wilc_sdio]) [ 65.016620] [<7f0e93dc>] (wilc_send_config_pkt [wilc_sdio]) from [<7f0e6968>] (wilc_frame_register+0x80/0x11c [wilc_sdio]) [ 65.016739] [<7f0e6968>] (wilc_frame_register [wilc_sdio]) from [<7f0b46f4>] (cfg80211_mlme_register_mgmt+0x14c/0x180 [cfg80211]) [ 65.016840] [<7f0b46f4>] (cfg80211_mlme_register_mgmt [cfg80211]) from [<7f0a7030>] (nl80211_register_mgmt+0x74/0x8c [cfg80211]) [ 65.016941] [<7f0a7030>] (nl80211_register_mgmt [cfg80211]) from [<8042483c>] (genl_rcv_msg+0x22c/0x3a0) [ 65.016958] [<8042483c>] (genl_rcv_msg) from [<80423c40>] (netlink_rcv_skb+0xb4/0xd8) [ 65.016970] [<80423c40>] (netlink_rcv_skb) from [<80424600>] (genl_rcv+0x24/0x34) [ 65.016982] [<80424600>] (genl_rcv) from [<804235c4>] (netlink_unicast+0x158/0x20c) [ 65.016994] [<804235c4>] (netlink_unicast) from [<80423a20>] (netlink_sendmsg+0x2e0/0x348) [ 65.017009] [<80423a20>] (netlink_sendmsg) from [<803e5388>] (sock_sendmsg+0x14/0x24) [ 65.017022] [<803e5388>] (sock_sendmsg) from [<803e5fec>] (___sys_sendmsg+0x1cc/0x1d4) [ 65.017035] [<803e5fec>] (___sys_sendmsg) from [<803e6c8c>] (__sys_sendmsg+0x3c/0x68) [ 65.017050] [<803e6c8c>] (__sys_sendmsg) from [<8000f400>] (ret_fast_syscall+0x0/0x3c) [ 65.031894] BUG: scheduling while atomic: wpa_supplicant/295/0x00000202
wilc_mgmt_frame_register() is called in atomic if you are using kernel 4.1. The dev driver is using mutex lock under wilc_mgmt_frame_register functions.
It seems that this problem can be fixed for kernel 4.1 by this patch - https://gitlab.freedesktop.org/lima/linux/commit/33d8783c58427683b533664f67f8c4378ed64495 Latest driver from dev branch does not fail with "scheduling while atomic" error for me anymore.
Thanks @AndreyEroshenko For the records, this patch was released in linux 4.3