kernel NULL pointer at PC is at debug_thread+0x144/0x2e8 [wilc_sdio]
we use wilc1000 in cpu NUC976(arm926t core)with linux 3.10.108. wilc module work well except for scheduling bug at https://github.com/linux4wilc/driver/issues/83 (Not yet solved).
There are two ways to to reproduce this "kernel NULL pointer Oops".
**1、Long time ping with packet size = 8000 **
when wilc1000 sucess to connect to a AP, we continuous ping the AP's gateway ip 192.168.50.1 . when sucessfull reach 173206 times(almost 49 hours), the ping cmd fail to ping any more. At this time the wilc driver log out "cfg_get Timed Out", and end with "kernel NULL pointer".
Below is the log:
8008 bytes from 192.168.50.1: seq=42118 ttl=64 time=130.000 ms
8008 bytes from 192.168.50.1: seq=42119 ttl=64 time=1020.000 ms
8008 bytes from 192.168.50.1: seq=42120 ttl=64 time=10.000 ms
8008 bytes from 192.168.50.1: seq=42121 ttl=64 time=1030.000 ms
8008 bytes from 192.168.50.1: seq=42122 ttl=64 time=20.000 ms
8008 bytes from 192.168.50.1: seq=42123 ttl=64 time=90.000 ms
[176580.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
8008 bytes from 192.168.50.1: seq=42124 ttl=64 time=110.000 ms
8008 bytes from 192.168.50.1: seq=42125 ttl=64 time=120.000 ms
8008 bytes from 192.168.50.1: seq=42126 ttl=64 time=1020.000 ms
8008 bytes from 192.168.50.1: seq=42127 ttl=64 time=10.000 ms
8008 bytes from 192.168.50.1: seq=42128 ttl=64 time=170.000 ms
8008 bytes from 192.168.50.1: seq=42129 ttl=64 time=70.000 ms
[176586.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
8008 bytes from 192.168.50.1: seq=42130 ttl=64 time=90.000 ms
8008 bytes from 192.168.50.1: seq=42131 ttl=64 time=200.000 ms
8008 bytes from 192.168.50.1: seq=42132 ttl=64 time=120.000 ms
8008 bytes from 192.168.50.1: seq=42133 ttl=64 time=1020.000 ms
8008 bytes from 192.168.50.1: seq=42134 ttl=64 time=10.000 ms
8008 bytes from 192.168.50.1: seq=42135 ttl=64 time=160.000 ms
[176592.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
8008 bytes from 192.168.50.1: seq=42136 ttl=64 time=70.000 ms
8008 bytes from 192.168.50.1: seq=42137 ttl=64 time=90.000 ms
[176598.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[176600.370000] cfg_get Timed Out
[176600.370000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[176600.390000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[176604.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[176605.370000] cfg_get Timed Out
[176605.370000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[176605.390000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[176610.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[176610.380000] cfg_get Timed Out
[176610.380000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[176610.400000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[176615.410000] cfg_get Timed Out
[176615.410000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[176615.420000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[176616.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[176620.400000] cfg_get Timed Out
[176620.400000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[176620.410000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[176622.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[176622.360000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]<Recover>
[176622.380000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Mac close
[176622.380000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Deinitializing wilc
[176622.400000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing wilc ...
[176622.400000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Disabling IRQ
[176622.440000] wilc_sdio mmc0:0001:1: wilc_sdio_disable_interrupt
[176622.450000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing Threads
[176622.470000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads
[176622.470000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads
[176622.490000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing IRQ
[176622.500000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing WILC Wlan
[176622.510000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]wilc deinitialization Done
[176622.520000] Unable to handle kernel NULL pointer dereference at virtual address 00000080
[176622.530000] pgd = c0004000
[176622.530000] [00000080] *pgd=00000000
[176622.530000] Internal error: Oops: 17 [#1] PREEMPT ARM
[176622.530000] Modules linked in: wilc_sdio(C) watchdogd(O) usb_f_rndis u_ether libcomposite
[176622.530000] CPU: 0 PID: 1647 Comm: WILC_DEBUG Tainted: G WC O 3.10.108 #5
[176622.530000] task: c38d82c0 ti: c1fd8000 task.ti: c1fd8000
[176622.530000] PC is at debug_thread+0x144/0x2e8 [wilc_sdio]
[176622.530000] LR is at wake_up_klogd+0x54/0x7c
[176622.530000] pc : [
^C --- 192.168.50.1 ping statistics --- 253244 packets transmitted, 173206 packets received, 31% packet loss round-trip min/avg/max = 10.000/238.313/1380.000 ms
2、do a ESD Test to wilc module Another ways to quickly reproduce kernel NULL pointer Oops is to do a ESD Test to wilc module, which will cause wilc_sdio_cmd52 time-out, and also lead to kernel NULL pointer Oops:
[ 405.740000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 411.740000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 417.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 423.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 429.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 435.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 441.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 447.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 453.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 455.700000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 455.700000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fa) ...
[ 455.710000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 455.720000] wilc_sdio mmc0:0001:1: Failed cmd52, set 0x10c data...
[ 455.730000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_handle_txq:1053] fail read reg vmm_tbl_entry..
[ 455.740000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 455.750000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fc) ...
[ 458.700000] cfg_get Timed Out
[ 458.700000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[ 458.710000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[ 459.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 460.710000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 460.710000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fa) ...
[ 460.720000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 460.730000] wilc_sdio mmc0:0001:1: Failed cmd52, set 0x10c data...
[ 460.740000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_handle_txq:1053] fail read reg vmm_tbl_entry..
[ 460.750000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 460.760000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fc) ...
[ 463.720000] cfg_get Timed Out
[ 463.720000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[ 463.730000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[ 465.720000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 465.720000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fa) ...
[ 465.730000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 465.740000] wilc_sdio mmc0:0001:1: Failed cmd52, set 0x10c data...
[ 465.750000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_handle_txq:1053] fail read reg vmm_tbl_entry..
[ 465.760000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 465.770000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 465.780000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fc) ...
[ 468.720000] cfg_get Timed Out
[ 468.720000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[ 468.730000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[ 470.730000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 470.730000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fa) ...
[ 470.740000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 470.750000] wilc_sdio mmc0:0001:1: Failed cmd52, set 0x10c data...
[ 470.760000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_handle_txq:1053] fail read reg vmm_tbl_entry..
[ 470.770000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 470.780000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fc) ...
[ 471.780000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 473.730000] cfg_get Timed Out
[ 473.730000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[ 473.740000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[ 475.740000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 475.740000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fa) ...
[ 475.750000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 475.760000] wilc_sdio mmc0:0001:1: Failed cmd52, set 0x10c data...
[ 475.770000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_handle_txq:1053] fail read reg vmm_tbl_entry..
[ 475.780000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 475.790000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fc) ...
[ 477.780000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 478.740000] cfg_get Timed Out
[ 478.740000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[ 478.750000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[ 480.750000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 480.750000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fa) ...
[ 480.760000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 480.770000] wilc_sdio mmc0:0001:1: Failed cmd52, set 0x10c data...
[ 480.780000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_handle_txq:1053] fail read reg vmm_tbl_entry..
[ 480.790000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 480.800000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fc) ...
[ 483.760000] cfg_get Timed Out
[ 483.760000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1786] Get Timed out
[ 483.770000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_get_statistics:1084] Failed to send scan parameters
[ 483.780000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
[ 483.790000] wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]<Recover>
[ 483.790000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Mac close
[ 483.800000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Deinitializing wilc
[ 483.810000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing wilc ...
[ 483.820000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Disabling IRQ
[ 483.830000] wilc_sdio mmc0:0001:1: wilc_sdio_disable_interrupt
[ 483.840000] wilc_sdio mmc0:0001:1: can't release sdio_irq, err(-110)
[ 483.850000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing Threads
[ 483.850000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads
[ 483.870000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads
[ 483.870000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing IRQ
[ 483.880000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 483.890000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fa) ...
[ 483.900000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 483.910000] wilc_sdio mmc0:0001:1: Failed cmd52, set 0x10c data...
[ 483.930000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_stop:1570] Error while reading reg
[ 483.950000] wilc_sdio mmc0:0001:1: wilc_sdio_cmd52..failed, err(-110)
[ 483.950000] wilc_sdio mmc0:0001:1: Failed cmd 52, read reg (000000fc) ...
[ 483.970000] wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_wlan_deinitialize:852] failed in wlan_stop
[ 483.970000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing WILC Wlan
[ 483.990000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]wilc deinitialization Done
[ 483.990000] Unable to handle kernel NULL pointer dereference at virtual address 00000080
[ 484.010000] pgd = c0004000
[ 484.010000] [00000080] *pgd=00000000
[ 484.010000] Internal error: Oops: 17 [#1] PREEMPT ARM
[ 484.010000] Modules linked in: wilc_sdio(C) qmi_wwan option usb_wwan watchdogd(O) usb_f_rndis u_ether libcomposite
[ 484.010000] CPU: 0 PID: 1205 Comm: WILC_DEBUG Tainted: G WC O 3.10.108 #6
[ 484.010000] task: c3876dc0 ti: c1d8c000 task.ti: c1d8c000
[ 484.010000] PC is at debug_thread+0x144/0x2e8 [wilc_sdio]
[ 484.010000] LR is at wake_up_klogd+0x54/0x7c
[ 484.010000] pc : [
When we dig into wilc driver and disassembly this kernel NULL pointer Oops,we got: debug_thread+0x144 = 0x222c+0x144=0x2370,locate to disassembly dump file:
0000222c <debug_thread>: int debug_running; int recovery_on; int wait_for_recovery; static int (void *arg) { 222c: e1a0c00d mov ip, sp 2230: e92ddff0 push {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc} ... for (i = 0; i < WILC_NUM_CONCURRENT_IFC; i++) wilc_mac_close(wl->vif[i]->ndev); 235c: e5943024 ldr r3, [r4, #36] ; 0x24 2360: e3a0a002 mov sl, #2 2364: e5930080 ldr r0, [r3, #128] ; 0x80 2368: ebfffb22 bl ff8 <wilc_mac_close> 236c: e5943028 ldr r3, [r4, #40] ; 0x28 2370: e5930080 ldr r0, [r3, #128] ; 0x80 // PC is at this line 2374: ebfffb1f bl ff8 <wilc_mac_close> 2378: e3a0300a mov r3, #10 for (i = WILC_NUM_CONCURRENT_IFC; i > 0; i--) { while (wilc_mac_open(wl->vif[i-1]->ndev) && --timeout) 237c: e24a2001 sub r2, sl, #1 2380: e28a9007 add r9, sl, #7
So we guess kernel NULL pointer Oops come from "wilc_mac_close(wl->vif[i]->ndev)", and we modify debug_thread() at wilc_netdev.c by adding NULL pointer check before access ndev of wl->vif[i],below is my modify: ---- a/linux-3.10.108/drivers/staging/wilc/wilc_netdev.c +++ b/linux-3.10.108/drivers/staging/wilc/wilc_netdev.c @@ -69,9 +69,13 @@ static int debug_thread(void *arg) timeout = 10; recovery_on = 1; wait_for_recovery = 1; -- for (i = 0; i < WILC_NUM_CONCURRENT_IFC; i++) -- wilc_mac_close(wl->vif[i]->ndev); ++ for (i = 0; i < WILC_NUM_CONCURRENT_IFC; i++) { ++ printk("close: wl->vif[%d] == %p\n", i, wl->vif[i]); ++ if(wl->vif[i]) ++ wilc_mac_close(wl->vif[i]->ndev); ++ } for (i = WILC_NUM_CONCURRENT_IFC; i > 0; i--) { ++ if(wl->vif[i-1] == NULL) {printk("open: wl->vif[%d] == %p\n", i-1, wl->vif[i-1]); continue;} while (wilc_mac_open(wl->vif[i-1]->ndev) && --timeout) msleep(100);
After this modify we don't have this kernel NULL pointer Oops any more, and the wilc driver go forward with other errors, we think the wilc driver can't recover itself from wilc_sdio_cmd52-failed, at this point console shell alway be hang if we type “ifconfig wlan0” or type “ifconfig wlan0 down” or type “rmmod wilc-sdio”, the only way we can do is to reboot the whole board.
after we re-power-on or reset the whole board, wilc work well again.
Our questions: 1、This kernel NULL pointer Oops is wilc driver bug? 2、After we modify debug_thread(void *arg) in wilc_netdev.c to avoid kernel NULL pointer Oops,but wilc_sdio_cmd52-failed remain,how can we fix this issue? 3、How to let wilc driver return to work,except reboot the whole board?
Finally i found out the reason why hang while type "rmmod wilc-sdio" after wilc_sdio_cmd52-failed . it is realate to a infinish loop"while (wait_for_recovery)" in wilc_send_config_pkt(), see my modify below:
int wilc_send_config_pkt(struct wilc_vif *vif, u8 mode, struct wid *wids, u32 count) { int i; int ret = 0; u32 drv = wilc_get_vif_idx(vif); ++ int cnt = 0;
if (wait_for_recovery) {
PRINT_INFO(vif->ndev, CORECONFIG_DBG,
"Host interface is suspended\n");
while (wait_for_recovery) {
msleep(300);
++ cnt++; ++ if(cnt > 10) { ++ printk("[MCB] wilc_send_config_pkt wait_for_recovery timeout(3s)!!!!\n"); ++ ret = -ETIMEDOUT; ++ return ret; ++ } } ... }
But successfully rmmod wilc-sdio, the console shell frozen and can't input any cmd, but can log out some extern HW irq relate printk. So i think the kernel is soft lockup after rmmod wilc-sdio.
Anyone have a advise?
@mgkiller7,
I think you are not using the latest driver code and in the recent release, the changes were done to handle a crash issue in debug recovery thread. It should help with the scenario you have mentioned in the 1st point(Long time ping issue). Also, the "scheduling while atomic" warning should not be observed in the latest release. The latest release is shared at wilc_linux_15_3. I suggest verifying the behavior with the latest driver and firmware release.
A similar crash is observed in 2 ways but they are different issues.
-
Long time ping with packet size: In this procedure, It's a temporary failure in SDIO CMD52 and recovery can result in success if the kernel Null pointer crash is avoided. Because during the reinitialized, the chance of SDIO CMD's success is more as its temporary failure. Please verify this scenario with the latest release or with the Kernel NULL pointer Oops fixes.
-
ESD Test to wilc module: There is a continuous SDIO CMD52 failure scenario and even if the recovery code doesn't crash the driver will fail to reinitialize because of continuous CMD52 failure. For recovery code success, the CMD52 should go thorough during re-initialize otherwise no use.
Regarding the shell hang issue, I suspect its happening because debug_thread and timers are not cleaned completely when there is a failure of recovery code (because of continuous CMD52 failure-point 2).
Changes suggested in wilc_send_config_pkt() for 'while' loop termination is not required if we clear the 'wait_for_recovery' in debug_thread() function. Please try below changes and check if it helps to overcome the shell hang issue.
--- a/drivers/staging/wilc1000/wilc_netdev.c
+++ b/drivers/staging/wilc1000/wilc_netdev.c
@@ -106,6 +106,7 @@ static int debug_thread(void *arg)
hif_drv->hif_state = HOST_IF_IDLE;
}
recovery_on = 0;
+ wait_for_recovery = 0;
}
return 0;
}
@@ -1234,6 +1235,12 @@ static int wilc_mac_close(struct net_device *ndev)
wl->open_ifcs--;
} else {
PRINT_ER(ndev, "MAC close called with no opened interfaces\n");
+ if (vif->ndev) {
+ netif_stop_queue(vif->ndev);
+
+ if (!recovery_on)
+ wilc_deinit_host_int(vif->ndev);
+ }
return 0;
}
@@ -1318,6 +1325,14 @@ void wilc_netdev_cleanup(struct wilc *wilc)
}
}
+ if (debug_running) {
+ if (&wilc->debug_thread_started)
+ complete(&wilc->debug_thread_started);
+ if (wilc->debug_thread) {
+ kthread_stop(wilc->debug_thread);
+ wilc->debug_thread = NULL;
+ }
+ }
wilc_wfi_deinit_mon_interface(wilc, false);
flush_workqueue(wilc->hif_workqueue);
Please note these changes may help to just remove the module successfully but to make wilc module work its must to have the success of SDIO CMD.
Thank you very much, ajaykathat Sorry for my delay. I will have a try your advise.
Hi, ajaykathat After add the your modify mentioned above to drivers/staging/wilc1000/wilc_netdev.c,in most cases it help to success to rmmod wilc-sdio indeed. But when i do pressure test last night(my test script loop do: insmod wilc-sdio.ko; try to connect a non-existent wifi ap; after 2mintes do ifconfig wlan0 down; rmmod wilc-sdio; sleep 10s and go on next loop), there is a small probability of rmmod wilc-sdio failure: [12351.270000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_netdev_cleanup]Unregistering netdev c393a000 (hang here, at 12351 Seconds(3.4Hours) after start-up)
The normal rmmod log is: [12178.330000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_netdev_cleanup]Unregistering netdev c393a000 [12178.410000] De-Initializing Locks [12178.450000] Freeing wiphy [12178.490000] Module_exit Done. [12178.490000] at_pwr_dev: deinit [12178.500000] at_pwr_dev: unregistered [12178.510000] mmc0: card 0001 removed
This morning i observed this rmmod hang,i login my board by telnet,and run “ps” cmd, i got: 。。。 8085 root 501:00 [K_TXQ_TASK] 。。。 (i think K_TXQ_TASK may in a dead loop last for 501Mintes(8.35Hours) after rmmod wilc-sdio )
This kernel thread is “int wilc_txq_task(void *vp)” created in wilc//wilc_netdev.c wilc->txq_thread = kthread_run(wilc_txq_task, (void *)wilc, "K_TXQ_TASK");
hi, ajaykathat I found that some logs are missing when doing ifconfig wlan0 down before rmmod wilc-sdio failure. the normal ifconfig wlan0 down is : [12175.270000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Mac close [12175.290000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Deinitializing wilc [12175.290000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing wilc ... [12175.310000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Disabling IRQ [12175.310000] wilc_sdio mmc0:0001:1: wilc_sdio_disable_interrupt [12175.340000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing Threads [12175.340000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads [12175.370000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinit debug Thread [12175.390000] Exit debug thread [12175.390000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads [12175.430000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing IRQ [12175.450000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing WILC Wlan [12175.470000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]wilc deinitialization Done [12175.480000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.490000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.500000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.510000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.520000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.530000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.540000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed
The bold log above is the missing log of doing the last ifconfig wlan0 down.
So i think static there must be dead loop in wilc_sdio_disable_interrupt() in wilc/wilc_sdio.c void wilc_sdio_disable_interrupt(struct wilc *dev) { struct sdio_func *func = container_of(dev->dev, struct sdio_func, dev); int ret;
dev_info(&func->dev, "%s\n", __func__);
if (sdio_intr_lock == WILC_SDIO_HOST_IRQ_TAKEN)
**wait_event_interruptible(sdio_intr_waitqueue,
sdio_intr_lock == WILC_SDIO_HOST_NO_TAKEN);**
sdio_intr_lock = WILC_SDIO_HOST_DIS_TAKEN;
sdio_claim_host(func);
ret = sdio_release_irq(func);
if (ret < 0)
dev_err(&func->dev, "can't release sdio_irq, err(%d)\n", ret);
sdio_release_host(func);
sdio_intr_lock = WILC_SDIO_HOST_NO_TAKEN;
}
================================================================ And the "scheduling while atomic" warning is also observed with this the latest release driver&firmware of wilc_linux_15_3 in linux-3.10.108.
。。。 [ 136.480000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_initialize]WILC Firmware Ver = WILC_WIFI_FW_REL_15_3 Build: 11606 。。。
wlan0: Trying to associa[ 144.940000] BUG: scheduling while atomic: wpa_supplicant/1064/0x00000202
te with 04:9[ 144.950000] Modules linked in:2:26:c7:3f:a wilc_sdio(C) watchdogd(O) usb_f_rndis u_ether libcomposite
8 (SSID='ASU[ 144.960000] CPU: 0 PID: 1064 Comm: wpa_supplicant Tainted: G C O 3.10.108 #8
S_A8_2G' fre[ 144.970000] Backtrace:
[ 144.970000] [
Hi @mgkiller7, Thanks for the verification. Regarding the failure of 'rmmod' under stress testing issue, I don't think it's related to the suggested fixes. Please share the script to have an idea about the complete test procedure. For the overnight testing did you also enable the failure of CMD52 or CMD52 was getting successful.?
Regarding "scheduling while atomic" warning I don't see with Linux kernel 4.9.36. Let's handle that issue as part of Issue#50
Hi, ajaykathat Thanks for your reply. I can‘t observe any CMD52 error in the whole testing. My stress testing script is: ~ # cat /nvm/wilc_scan.sh while true do # install wilc driver insmod /lib/modules/3.10.108/wilc-sdio.ko sleep 5
# try to connect a non-existent wifi ap
echo "ctrl_interface=/var/run/wpa_supplicant" > /tmp/wilc_wpa_supplicant.conf
echo "update_config=1" >> /tmp/wilc_wpa_supplicant.conf
wpa_supplicant -iwlan0 -Dnl80211 -c /tmp/wilc_wpa_supplicant.conf &
sleep 5
wpa_cli scan
sleep 3
wpa_cli scan_results
wpa_cli -p/var/run/wpa_supplicant ap_scan 1
wpa_cli -p/var/run/wpa_supplicant add_network
#wpa_cli -p/var/run/wpa_supplicant set_network 0 ssid '"ASUS_A8_2G"'
wpa_cli -p/var/run/wpa_supplicant set_network 0 ssid '"None-Exist-AP"'
wpa_cli -p/var/run/wpa_supplicant set_network 0 key_mgmt WPA-PSK
wpa_cli -p/var/run/wpa_supplicant set_network 0 psk '"12345678"'
wpa_cli -p/var/run/wpa_supplicant select_network 0
sleep 30
#udhcpc -i wlan0 -q
# down wlan0
ifconfig wlan0 down
# uninstall wilc driver
rmmod wilc-sdio
# sleep
sleep 2
done
=================================================================
I found that some logs are missing when doing ifconfig wlan0 down before rmmod wilc-sdio failure. the normal ifconfig wlan0 down is : [12175.270000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Mac close [12175.290000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mac_close]Deinitializing wilc [12175.290000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing wilc ... [12175.310000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Disabling IRQ [12175.310000] wilc_sdio mmc0:0001:1: wilc_sdio_disable_interrupt [12175.340000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing Threads [12175.340000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads [12175.370000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinit debug Thread [12175.390000] Exit debug thread [12175.390000] wilc_sdio mmc0:0001:1 wlan0: INFO [wlan_deinitialize_threads]Deinitializing Threads [12175.430000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing IRQ [12175.450000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]Deinitializing WILC Wlan [12175.470000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_wlan_deinitialize]wilc deinitialization Done [12175.480000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.490000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.500000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.510000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.520000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.530000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed [12175.540000] wilc_sdio mmc0:0001:1 wlan0: INFO [wilc_mgmt_frame_register]Return since mac is closed
The bold log above is the missing log of doing the last ifconfig wlan0 down. I can see the debug thread didn't exit yet that time, this means ifconfig wlan0 down is not success complete. So i think this uncomplete ifconfig wlan0 down cause rmmod wilc-sdio hang(mybe some locks didn't release, and ifconfig wlan0 down request it).
i think static there must be dead loop in wilc_sdio_disable_interrupt() in wilc/wilc_sdio.c void wilc_sdio_disable_interrupt(struct wilc *dev) { struct sdio_func *func = container_of(dev->dev, struct sdio_func, dev); int ret;
dev_info(&func->dev, "%s\n", __func__);
if (sdio_intr_lock == WILC_SDIO_HOST_IRQ_TAKEN)
**wait_event_interruptible(sdio_intr_waitqueue,
sdio_intr_lock == WILC_SDIO_HOST_NO_TAKEN);**
sdio_intr_lock = WILC_SDIO_HOST_DIS_TAKEN;
sdio_claim_host(func);
ret = sdio_release_irq(func);
if (ret < 0)
dev_err(&func->dev, "can't release sdio_irq, err(%d)\n", ret);
sdio_release_host(func);
sdio_intr_lock = WILC_SDIO_HOST_NO_TAKEN;
}
@mgkiller7 during the last two years i noticed over and over again that repeated modprobe & rmmod od doing so while on high cpu load, causes the driver (or the whole system) to hang.
To give you an idea of how we handle this see:
my $driverstatus = `lsmod | grep wilc | grep -v -e "cfg80211" | awk '{print \$3}'`;
chomp($driverstatus);
TRACE("driverstatus: $driverstatus");
if ( $driverstatus eq "-1" ) {
print("WILC driver hangs");
I've writte a bit about it in https://github.com/linux4wilc/firmware/issues/7
Your issue is rather a side effect for us because reloading the driver is what we do in order to be able to operate the WILC3000 as an AP at all.
Here as well we can force the whole thing to crash just by doing a
ping -s 65000
instead of
ping
Hi,Mateusz-Gwara Thank you for your sharing. Your skill is good to determine whether the rmmod wilc-sdio in hang,after this hang issue happn then i can reboot my whole board to recovery. But it is not a fundamental solution to this issue.
But it is not a fundamental solution to this issue. ^^this is definitely true
Unfortunately the salesforce support practically stopped working with us, delaying tickets until they expire.
If you find a way to avoid a hanging module which forces the system to be rebooted, I would be very thankful for any information.
I didn't see hang issue with "ping -s 65500" command. I am using SAMA5D4 Xplained + wilc3000 SDIO and using the latest firmware and driver. I have also tested with WILC3000 SPI and didn't observe the immediate hang issue.
@mgkiller7 If you suspect the hang happening in wilc_sdio_disable_interrupt(), can you print the value of "sdio_intr_lock" variable to confirm? Also, keep one debug print while coming out of wilc_sdio_disable_interrupt().
- what is the occurrence rate for this issue. Are you seeing this issue once every ~4 hrs?
- Do you also see this issue if you do not start the wpa_supplicant( i.e scan and connect step)?
@ajaykathat The hang takes several days sometimes to occur. Streaming a youtube video or webcam feed usually causes problems way faster.
Hi, Mateusz-Gwara, ajaykathat Since update to the latest wilc driver&firmware, this issue never seen again. Best regards, mgkiller7
@mgkiller7 Did you use the latest dev or master branch driver & fw???
Regards
Mateusz Gwara