Long Running Futures Eventually Stall
I am currently developing a firmware for the ESP that needs to serve the camera feed as HTTP stream. For testing purposes, I reduced that further down to serving a simple JPEG animation. I initially used edge-executor::LocalExecutor to run my custom async HTTP server, handling every connection in a separate future spawned to the same thread-local executor. One stream to one client is therefore one (possibly infinitely) running async task.
Unfortunately, I ran into some weird issues: When running multiple streams in parallel, any stream might freeze randomly and never get back to live. Sometimes, the whole ESP crashes with some unhandled exception. It does not seem to be the same exception every time, but here is one:
Guru Meditation Error: Core 0 panic'ed (Unhandled debug exception).
Debug exception reason: BREAK instr
Core 0 register dump:
PC : 0x420171d7 PS : 0x00060b36 A0 : 0x82010e96 A1 : 0x3fccf060
0x420171d7 - alloc::task::raw_waker::clone_waker
at ??:??
A2 : 0x3fcd1154 A3 : 0x3fcd1154 A4 : 0x00000000 A5 : 0x3fccf3a0
A6 : 0x3fc9ade8 A7 : 0x3fca12dc A8 : 0xfefefefe A9 : 0xffffffff
0x3fc9ade8 - xKernelLock
at ??:??
0x3fca12dc - pxReadyTasksLists
at ??:??
A10 : 0xfefefefe A11 : 0xfefefefe A12 : 0xfefefefe A13 : 0x00060b23
A14 : 0xfffffffe A15 : 0x0000cdcd SAR : 0x00000020 EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x40056f5c LEND : 0x40056f72 LCOUNT : 0xffffffff
Backtrace: 0x420171d4:0x3fccf060 0x42010e93:0x3fccf080 0x4201800a:0x3fccf9c0 0x4202bb7f:0x3fccf9f0 0x42030838:0x3fccfa10
0x420171d4 - alloc::task::raw_waker::clone_waker
at ??:??
0x42010e93 - std::sys_common::backtrace::__rust_begin_short_backtrace
at ??:??
0x4201800a - core::ops::function::FnOnce::call_once{{vtable.shim}}
at ??:??
0x4202bb7f - std::sys::pal::unix::thread::Thread::new::thread_start
at ??:??
0x42030838 - pthread_task_func
at ??:??
At some point I tried using a different executor. After replacing edge-executor by futures-executor::LocalPool, I could not reproduce these issues anymore. The streams seem to continue running without problems. Then, I've written my own version of LocalExecutor from scratch without using any third-party crates, and this also seems to work fine.
This makes me believe, that edge-executor has a bug... somewhere. I have not really a clue where it is. I just found this open issue in async-task talking about tasks not getting rescheduled randomly. That would fit my observations, at least.
Sorry for not providing sample code. I am developing the firmware as employee and therefore cannot just share our product's firmware. If you are interested, I can however try to build a minimal example, when I find the time.
Aww, this sounds frightening :( . I do hope it is not really async-task as you suggest, or else the whole smol-rs ecosystem has a big trouble.
Perhaps it is related to our queues. Btw - unlike async-executor, edge-executor's queues are bound by default; what this means is that if you are spawning extra tasks during the life cycle of your app, and at some point in time the number of active tasks becomes bigger than the queue size, the app would crash in an unpredictable way, most likely precisely in the waker.
Anything you could share about that problem w.r.t. minimal reproducible case would be greatly appreciated.
If your employer is OK with that, sharing your own Local executor might also be interesting.
Just for testing... what happens if you ~~remove the heapless feature and~~ add the unbounded feature?
I already tested it with the unbounded feature enabled, since I also assumed this to be the root cause. No luck, however. My last test today was with this configuration:
edge-executor = { version = "0.4", features = ["unbounded"] }
I talked to my supervisor and he is okay with providing a minial reproducable example and also with sharing my own executor implementation. The question would just be, how to handle the latter part. Maybe for starters, if I push it into a private Github repo and give you access? Then we could discuss if we want to merge it into edge-executor and how.
For now, I will try to build a minimal example for the bug.
Yes - a private GH repo for the executor would be ideal, many thanks!
Also, the following question: have you tried with async-executor? It is also based on async-task and - moreover - is API-compatible with edge-executor. If async-executor crashes on you as well, chances are there is a serious bug in async-task. Otherwise, there might just be something wrong in edge-executor itself, which should be easier to fix.
Interesting... I cannot reproduce the issue in my minimal example. My best guess would be, that this is a compiler bug again. I've seen them before in my firmware with stuff like:
- just including
tokioas dependency without using it would make the streaming significantly more performant - removing logging would make it slow again
- re-including the logging would make it fast again
- removing the logging again wouldn't impact performance... 0.o
I don't know whats happening. But maybe its not a bug with edge-executor or even async-task, but just some weird side-effects?
I don't know whether this is even an option, but how about flashing (a subset of) your firmware on a RISCV MCU and trying that out instead? The compiler backend for RISCV is upstreamed in LLVM and I guess it is significantly more stable than xtensa.
We are currently having this miscompilation issue, which seems to be triggered by a very large alignment (padding) in ConcurrentQueue that LLVM xtensa does not like.
Coincidentally, edge-executor uses ArrayQueue by default, which has the very same large cache-padding.
Chances this to be the culprit are slim, but trying out with RISCV is one option to eliminate this hypothesis; another would be to enable heapless and try with the queue from heapless.
A third option is to try your full firmware with async-executor as I mentioned earlier.
Also: are you using any async drivers from esp-idf-hal? Perhaps you can stop those during your test?
Interesting... I cannot reproduce the issue in my minimal example. My best guess would be, that this is a compiler bug again. I've seen them before in my firmware with stuff like:
- just including
tokioas dependency without using it would make the streaming significantly more performant
I think you might be getting paranoid :-) The linker would erase all traces of tokio if you don't call into it.
- removing logging would make it slow again
- re-including the logging would make it fast again
- removing the logging again wouldn't impact performance... 0.o
As I said - maybe trying it out with RISCV is an option. Definitely sounds weird though, but without at least seeing a portion of the code, I'm running out of hypotheses.
A short update for today: I finally can reproduce the issue with my minimal example. It happens with both edge-executor and async-executor, but only if the PSRAM is enabled, which is ... interesting. The exception messages make it look like heap corruptions, but enabling the integrity checks for the heap in the ESP IDF makes for less helpful panic messages.
Whatever, it will take me until tomorrow to give you a clean report and access to my private executor.
Hmmmmm. This might be related. Are you confining the PSRAM to just one of the two CPUs, or are you happily using malloc with PSRAM enabled?
@fko-kuptec Is this running on the stock ESP32, or on ESP32S3? Asking because - at least in theory - the ESP32S3 should not have the atomics-in-PSRAM issue I mentioned in my previous comment. (Also - and to my understanding - no RISCV chip should be affected by that.)
I am running the tests on ESP32-S3 :)
Ok, so either the s3 is also affected, or there is something else going on with PSRAM. In any case, your reproducible example, as well as your infallible executor would help greatly in figuring out what is going in. Especially given that you have confirmed that async-executor is also affected.
Also and if you have them handy - can you paste here other stack crashes you have recorded?
Finally, you can find the reproduction of the bug in this repo. The README includes instructions, information on what works and what doesn't, and a few stack traces.
I have also given you private access to the fko-kuptec/emserve repository, containing my LocalExecutor implementation.
Thanks!
One thing I noticed (still shooting in the dark) is that you are running both async-executor and edge-executor on top of the single-future executor of esp_idf_hal::task::block_on. Have you tried using either of the two, but with futures_lite::task::block_on instead, with PSRAM enabled?
One thing I noticed (still shooting in the dark) is that you are running both async-executor and edge-executor on top of the single-future executor of esp_idf_hal::task::block_on. Have you tried using either of the two, but with futures_lite::task::block_on instead, with PSRAM enabled?
I tried it with futures-lite::futures::block_on and edge-executor and now I am getting these panics:
assert failed: block_split tlsf.c:410 (block_to_ptr(remaining) == align_ptr(block_to_ptr(remaining), ALIGN_SIZE) && "remaining block not aligned properly")
Backtrace: 0x40375cfa:0x3fcc0190 0x4037cded:0x3fcc01b0 0x403846c5:0x3fcc01d0 0x40382a4b:0x3fcc02f0 0x40382427:0x3fcc0310 0x403763a3:0x3fcc0330 0x403763ee:0x3fcc0360 0x403846d5:0x3fcc0380 0x4202f9b4:0x3fcc03a0 0x4202fa5c:0x3fcc03c0 0x4202fa9b:0x3fcc03e0 0x42034a6b:0x3fcc0400 0x42034efa:0x3fcc0430 0x4203e8e0:0x3fcc0480 0x4203f68c:0x3fcc04b0 0x4202f769:0x3fcc04d0
0x40375cfa - panic_abort
at ??:??
0x4037cded - esp_system_abort
at ??:??
0x403846c5 - __assert_func
at ??:??
0x40382a4b - tlsf_malloc
at ??:??
0x40382427 - multi_heap_malloc_impl
at ??:??
0x403763a3 - heap_caps_malloc_base
at ??:??
0x403763ee - heap_caps_malloc_default
at ??:??
0x403846d5 - pvalloc
at ??:??
0x4202f9b4 - mem_malloc
at ??:??
0x4202fa5c - do_memp_malloc_pool$isra$0
at ??:??
0x4202fa9b - memp_malloc
at ??:??
0x42034a6b - tcp_create_segment
at ??:??
0x42034efa - tcp_write
at ??:??
0x4203e8e0 - lwip_netconn_do_writemore$isra$0
at ??:??
0x4203f68c - lwip_netconn_do_write
at ??:??
0x4202f769 - tcpip_thread
at ??:??
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x40382b4a PS : 0x00060133 A0 : 0x8038242a A1 : 0x3fcabaf0
0x40382b4a - tlsf_malloc
at ??:??
A2 : 0x3fccc91c A3 : 0x3fca1eac A4 : 0x00060720 A5 : 0x00060123
A6 : 0x00000001 A7 : 0x3fca1e78 A8 : 0x3fca1e54 A9 : 0xffffffff
A10 : 0x3fccc914 A11 : 0x00000019 A12 : 0x3fccc9e8 A13 : 0x00000010
A14 : 0x00000020 A15 : 0x00000005 SAR : 0x0000001b EXCCAUSE: 0x0000001d
EXCVADDR: 0x0000000b LBEG : 0x40056f5c LEND : 0x40056f72 LCOUNT : 0xffffffff
Backtrace: 0x40382b47:0x3fcabaf0 0x40382427:0x3fcabb10 0x403763a3:0x3fcabb30 0x403763b9:0x3fcabb60 0x40379035:0x3fcabb80 0x4209d24d:0x3fcabba0 0x40385e65:0x3fcabbd0 0x400398b5:0x3fcabc00 0x40383c31:0x3fcabc20 0x4038543d:0x3fcabc70
0x40382b47 - tlsf_malloc
at ??:??
0x40382427 - multi_heap_malloc_impl
at ??:??
0x403763a3 - heap_caps_malloc_base
at ??:??
0x403763b9 - heap_caps_malloc
at ??:??
0x40379035 - esp_coex_common_malloc_internal_wrapper
at ??:??
0x4209d24d - esf_buf_alloc_dynamic
at ??:??
0x40385e65 - esf_buf_alloc
at ??:??
0x40383c31 - ieee80211_output_process
at ??:??
0x4038543d - ppTask
at ??:??
Note, though, that I am also running my own executor with the ESP-IDF block_on(), which seems to work. Therefore, I don't think thats the root cause.
Oh, and regarding RISC-V: Unfortunately, we don't have any normal ESP32 RISC-V devkits here. The only thing we have is the brand new P4 evaluation kit, but that's a different kind of beast...
I was able to reproduce twice or trice your crashes with a stock esp32 + psram (I don't have any s3 with psram).
I'm not able to reproduce your crashes without psram so far, but honestly I did not try hard enough.
Even with psram, reproducing it after the first crash was a bit difficult. I have to wait quite a bit (sometimes 15+ minutes) until it happens.
Neither with, nor without psram I'm not observing any delays or stalls. Might be an s3-specific behavior of the same problem.
It is true that opening 3rd 4th 6th chrome window becomes slower and slower (the esp accepting 7th connection does not even work for me!) but this happens with or without psram and I think it is just how many sockets poor little esp can handle simultaneously or something
I still suspect the problem is related to atomics not working Ok when they are in psram, but I can't prove it yet (noticed your own executor does not use any atomics). Will continue working on this next week, juts keep in mind it might take a bit of time (will be away on Monday).
Thank you for taking a look :)
I'm not able to reproduce your crashes without psram so far, but honestly I did not try hard enough.
Without PSRAM I couldn't find any crashes, as well (see the table in the repo's README). The issue only seems to occur when PSRAM is enabled.
Even with psram, reproducing it after the first crash was a bit difficult. I have to wait quite a bit (sometimes 15+ minutes) until it happens.
It's quite random, that's true. For me, however, the stalls or crashes usually start to show up within one or two minutes, maybe.
It is true that opening 3rd 4th 6th chrome window becomes slower and slower (the esp accepting 7th connection does not even work for me!) but this happens with or without psram and I think it is just how many sockets poor little esp can handle simultaneously or something
Yeah, the poor thing is not really made for that. But becomming slower is not really my concern, its just that it stalls for me under the described circumstances.
noticed your own executor does not use any atomics
Oh, my executor is using atomics, just the ones from std. So, no extra dependency for that ^^
Will continue working on this next week, juts keep in mind it might take a bit of time (will be away on Monday).
I am using my own executor in our firmware, which seems to run rock-solid now. So no pressure :)
@fko-kuptec While it is true that I did not look any further into this issue, what I can confirm in the meantime is the following: On esp32s3:
- If you have enabled PSRAM (
CONFIG_SPIRAM=y) -
And if you have enabled
CONFIG_SPIRAM_USE_MALLOC=y
You'll surely have issues, regardless of what executor you use, because on the s3, atomics in PSRAM simply do not work.
The only way to use PSRAM on that chip currently, is to make really sure, that you don't have any Rust atomics in a chunk of memory that happens to live in PSRAM. In essence, this means you need to switch off CONFIG_SPIRAM_USE_MALLOC=y if you have enabled that
@ivmarkov Thanks for your reply! :)
I looked up the reproduction repo and I am only using CONFIG_SPIRAM_USE_CAPS_ALLOC=y. As far as I remember, the issues also didn't occur with my own executor implementation. I only use std atomics there, but I don't know if that makes any difference. Furthermore, I don't think I even actively used PSRAM in that example. I just enabled it, which seems to trigger the issues with some executors.