NimBLE-Arduino icon indicating copy to clipboard operation
NimBLE-Arduino copied to clipboard

Fix NimBLEDevice::init() deadlock problem

Open CW-B-W opened this issue 2 years ago • 14 comments

NimBLEDevice::host_task() has priority (configMAX_PRIORITIES - 4), which is defined at https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/nimble/porting/npl/freertos/src/nimble_port_freertos.c#L60-L61

When the priority of NimBLEDevice::init() is greater than (configMAX_PRIORITIES - 4), NimBLEDevice::init() will block NimBLEDevice::host_task() but still wait for NimBLEDevice::host_task() infinitely (waiting for m_synced to be set), resulting in the deadlock between NimBLEDevice::init() and NimBLEDevice::host_task().

This is caused by this while loop https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/NimBLEDevice.cpp#L909-L911 When the priority of NimBLEDevice::init() is greater than NimBLEDevice::host_task(), it cannot yield and let NimBLEDevice::host_task() be executed.

Using a binary semaphore can resolve this problem.

CW-B-W avatar Aug 19 '23 03:08 CW-B-W

I've never heard of or experienced this issue before, do you have any examples where this can be reproduced?

h2zero avatar Aug 24 '23 03:08 h2zero

Hello @h2zero,

Here is the minimum reproducible example (it tested it with https://github.com/h2zero/NimBLE-Arduino/commit/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5),
In this example, it will never print Initialized.
It got stuck inside NimBLEDevice::init(), and the NimBLEDevice was never sucessfully initialized.
And after a few seconds the system will be reset by watchdog.

#include <NimBLEDevice.h>

void InitNimbleTask(void* args)
{
    Serial.println("Initializing NimBLEDevice");
    NimBLEDevice::init("NimBLE-Arduino");
    Serial.println("Initialized");

    NimBLEDevice::startAdvertising();

    vTaskDelete(NULL);
}

void setup() {
    Serial.begin(115200);

    // host_task has priority (configMAX_PRIORITIES - 4)
    const uint32_t priority = (configMAX_PRIORITIES - 3); 

    // host_task is pinned at core 0
    xTaskCreatePinnedToCore(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL, 0); 
}


void loop() {

}

As stated in the PR, I think it's because host_task has priority (configMAX_PRIORITIES - 4)
https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/nimble/porting/npl/freertos/src/nimble_port_freertos.c#L60-L61
When NimBLEDevice::init() has priority greater than (configMAX_PRIORITIES - 4), the host_task starves.

And if we set the priority of InitNimbleTask to (configMAX_PRIORITIES - 4) or lower, it works again.


Some even trickier situation,

  1. If we add a print in the problematic loop
    https://github.com/h2zero/NimBLE-Arduino/blob/bc333ccb6e8d9ff2059af9cbd5006a290a4de3a5/src/NimBLEDevice.cpp#L909-L911

    while(!m_synced){
        Serial.println("YIELD");
        taskYIELD();
    }
    

    It works again!
    This is because Serial.println() make the NimBLEDevice::init() task go to block state, which gives host_task some time to be executed.

  2. If we don't pin InitNimbleTask at core 0 If we created the task with xTaskCreate instead of xTaskCreatePinnedToCore, it magically works.

    const uint32_t priority = (configMAX_PRIORITIES - 3);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);
    

    But if we change the priority to (configMAX_PRIORITIES - 2) or (configMAX_PRIORITIES - 1), it gets stuck again.

    const uint32_t priority = (configMAX_PRIORITIES - 2);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);
    

    I don't really know the reason behind, but I guess maybe it's because ESP32 has 2 cores, so the host_task may not be starved.

CW-B-W avatar Aug 24 '23 13:08 CW-B-W

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

h2zero avatar Aug 25 '23 20:08 h2zero

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

Actually, I don't really need the priority of application be higher, I just accidentally found this problem when I set different task priority for my application. I was porting this project to an MCU (other than Arduino, ESP32), and it uses polling to read user command from UART, one of the UART command is to turn BLE on. To make the UART command handler be as responsive as possible, I made its priority the highest configMAX_PRIORITIES - 1, then I found when I launch NimBLEDevice::init() inside this command handler, which has the highest priority, it gets stuck.

CW-B-W avatar Aug 26 '23 11:08 CW-B-W

@CW-B-W That is a great explanation, thank you for this. I have tested your example and propose a different fix.

Instead of the semaphore, could you just replace the taskYIELD() in the while loop with ble_npl_time_delay(1) instead, that should resolve the issue as well without the extra memory requirements.

h2zero avatar Nov 25 '23 02:11 h2zero

Came here to say I've also independently found this issue

i-am-shodan avatar Aug 06 '24 14:08 i-am-shodan

FYI I tried to use ble_npl_time_delay(1) I even went so far as making m_synced volatile. Doesn't work for me

i-am-shodan avatar Aug 06 '24 14:08 i-am-shodan

Doesn't look like the patch works for me either. xSemaphoreTake never returns.

i-am-shodan avatar Aug 06 '24 15:08 i-am-shodan

So after a lot of debugging it looks like one of my own tasks starving resources from NimBLEDevice. From the comments it looks like this lib doesn't need to create it's own task. I'd love to know how to implement this.

i-am-shodan avatar Aug 06 '24 16:08 i-am-shodan

@i-am-shodan you shouldn't need to do anything other that add a call to delay() in your own tasks/loop, this allows for a context switch.

h2zero avatar Aug 10 '24 01:08 h2zero