esp32_https_server Heap consumption of SSL

While testing your webserver sources I found out that the memory consumption of the server is quite high. Checking the situation more in detail I saw that the call to SSL_new() demands a quite high amount of heap - roughly 37kB for each connection. I am thinking about an application where I would like to store a lager webpage on an SD card attached to the ESP32. This application will probably be implemented using vue.js and vuetify framework. When reading this webpage from the ESP32 the browser will open several SSL connections simultaneously (for reading jpg, css, js, html, ...) . Five to six simultaneous connections will be the usual case. This will drain the available memory quite fast. In your example application the free heap has roughly 212kB - so, maximum five connections are possible. Do you have a hint how to handle this situation?

Kind regards Lothar

Aug 23 '18 19:08 squonk11

That memory consumption is the reason for setting maxConnections to 4 by default in the constructor of HTTPSServer. By now, I wasn't able to figure out a better solution, but it might require a bit of digging in the OpenSSL API. However, the library implements some measures to reduce the effect of this problem:

It sets an SSL session timeout of 5 minutes. A browser should be able to reuse the SSL session which will shorten the handshake notably as the time consuming public key crypto stuff can be skipped. Tradeoff is a bit more memory consumption to store the SSL sessions though. This should work for your scenario.
The server tries to set Connection: keep-alive which will allow the client to reuse a connection without closing it, so no additional SSL or TCP handshake is required. However, to make this possible, the server needs to know the content length in advance to set the corresponding response header. To do so, the response is cached internally up to a size of 1400 Byte. If it gets bigger, the server will go for saving memory and drop keep-alive support by sending a response without the content length header. This will be a problem for your use case as vue.js applications will easily exceed the 1400 Byte threshold. You could adjust the limit in the HTTPSServerConstants.hpp file, but that would affect every request. Another solution would be to allow the request handler function to set the content length manually, as I assume that you know the file sizes of the files stored on your SD card. Up to now, the server does not recognize a Content-Length that is set via setHeader(name,value) of HTTPResponse in the request handler function. Interpreting this value could enable keep-alive for big files in certain scenarios, but I would have to implement that first.

My tests with the example sketch in Chromium and Firefox (using the Developer Tools and showing the waterfall diagram in the network tab) suggest that both measures are helpful for real-world applications with a few resource files and I would give it a try. Also you could adjust cache settings for your scripts, so that the big static files are stored at the client for a longer time.

Additionally I will implement a function to set the content size in the handler function (see #12), but again, that may take me some days.

Aug 24 '18 08:08 fhessel

Concerning the memory consumption of SSL I just found another interesting option: Espressif recently published a new ESP32 module (ESP32-WROVER) with 8MByte additional PSRAM. This RAM can not be used as program RAM but as data RAM (also for heap and stack). Maybe this could solve the memory issue. But I am not sure if this RAM can also be accessed by the encryption unit - this might be necessary because the RAM occupied by SSL-new() is being used as a buffer for TLS encryption? I will try to check this out.

Concerning the keep-alive scenario I think there are two possible ways to manage it: one option is to use the Content-Length header. The other (maybe easier) option can be using chunked data transfer or see here. With chunked data transfer you can send data chunks with any known length and finish the whole transfer with a zero length chunk. I think this could also be an interesting option. What do you think?

Playing with the cache settings in my scripts is an interesting hint. I will try this.

Best regards Lothar

Aug 24 '18 13:08 squonk11

Adding chunked data transfer seems to be a good idea to get around the hard buffer length of 1400 bytes. I think I would add the functionality to the HTTPResponse class so that the handler function can enable this behavior before sending the first data and the default behavior is still to try to send a single response packet.

Aug 27 '18 14:08 fhessel

I just ran into this issue as well, and it is very hard to debug (because the only indication is the [HTTPS:E] SSL_new failed. message).

Suggestion: if you replace that error with something like

        HTTPS_LOGE("SSL_new failed. Aborting handshake. FID=%d", resSocket);
        size_t memAvail = heap_caps_get_free_size(MALLOC_CAP_8BIT);
        HTTPS_LOGE("Available mem: %ld bytes", (long)memAvail);

it may help a little in debugging out-of-memory issues.

Apr 09 '20 22:04 jackjansen

Do you have a way to "reliably" run into this problem? I've finally got an ESP-prog so I could investigate that further.

And maybe it would also be a good idea to log heap_caps_get_largest_free_block, as the problem often isn't the amount of free memory, but finding a consecutive region. SSL_new seems to allocate a big chunk at once.

Apr 09 '20 22:04 fhessel

The easiest way to reliably run into the problem (in a test situation) would be to allocate a large amount of memory before the first incoming https connection. malloc(heap_caps_get_free_size()-20000), or something similar with heap_caps_get_largest_free_block() would seem like a good option.

I've tried debugging this (with a minimodule), but the SSL_new call was as deep as I got, because below that there's no source code available (at least I couldn't find it).

I personally ran into the problem in the context of a https://github.com/cwi-dis/iotsa program that used esp32_https_server_compat and used the esp32 BLE library (which is also a memory hog) and allocated a large amount of memory itself to do NeoPixel animation.

Apr 09 '20 23:04 jackjansen

The problem is that the Arduino Core is with the precompiled esp-idf libraries. They do come with debugging information (you'll see that, e.g., when running the stack trace decoder or by running xtensa-esp32-elf-objdump on one of the libs). But the source is only available through the corresponding modules in the IDF. Might indeed be a bit tricky to match/integrate that with PlatformIO. And I haven't tried if I can build the whole Arduino core by myself easily.

The easiest way to reliably run into the problem (in a test situation) would be to allocate a large amount of memory before the first incoming https connection. malloc(heap_caps_get_free_size()-20000), or something similar with heap_caps_get_largest_free_block() would seem like a good option.

That would work if we assume that a memory allocation failure is the root cause for SSL_new to fail (which it most likely is, based on what we observe, but well...).

I'll see if I can get the debugger further down the stack.

Edit: What module are you using? The default WROOM? Does it have PSRAM and are you using that? Just so that I match your situation well.

Apr 09 '20 23:04 fhessel

I'm using a lot of different modules, I originally came across the problem on a pico32 (which is a convenient module to put into my LED strips), but it doesn't support the minimodule so then I switched to a lolin32. Neither have PSRAM.

But I did some reading up, and the core of the problem is the Bluedroid BT stack. It allocates 180KB at startup. And there's a call to free memory you don't need (for example classic BT memory if you're only using BLE) but it doesn't seem to work. So practically I'm going to wait for NimBLE to be an option (as an alternative to Bluedroid) with esp32/Arduino. And I'll just not combine BLE and HTTPS until then.

Apr 10 '20 00:04 jackjansen

So I did a bit of debugging on hardware to see what exactly is failing.

Application: I started with the "Static Page" example, and added a memory allocation at the end of the setup() function. Basically, I am calling new char[20000]; seven times. That leaves 62492 bytes of memory, with a largest free block of 33728 bytes for my application. The code to log the free memory was as follows:

Serial.printf("Default Memory:       free size: %8u bytes   largest free block: %8u\n",
  heap_caps_get_free_size(MALLOC_CAP_DEFAULT),
  heap_caps_get_largest_free_block(MALLOC_CAP_DEFAULT));
// explanation for the following line comes below:
Serial.printf("Internal 8bit Memory: free size: %8u bytes   largest free block: %8u\n",
  heap_caps_get_free_size(MALLOC_CAP_INTERNAL|MALLOC_CAP_8BIT),
  heap_caps_get_largest_free_block(MALLOC_CAP_INTERNAL|MALLOC_CAP_8BIT));

Debugger Setup: I connected an esp-prog via JTAG to the module's JTAG port and used the DevBoards FTDI. The platformio.ini file looks like this:

[platformio]
env_default = wroom-debug

[env:wroom-debug]
upload_protocol = esptool
upload_speed = 921600
upload_port = /dev/ttyUSB2
debug_tool = esp-prog
debug_init_break = tbreak setup
build_flags =
  -DHTTPS_LOGLEVEL=5
  -DHTTPS_LOGTIMESTAMP

With that and using Visual Studio Code, PlatfromIO will generate debug profiles that can be used to start the application. Now comes the trick to be able to debug the esp-idf modules. Clone the esp-idf in version 3.2.3:

mkdir -p ~/tmp
git clone -b 3.2.3 https://github.com/espressif/esp-idf.git ~/tmp/esp-idf
# Also init mbedtls
cd ~/tmp/esp-idf
git submodule update --init components/mbedtls/mbedtls

Then you can start a debugging session. As soon as the debugger is connected and the breakpoint in setup() is reached, go to the debugger command line below the debugger output and enter:

set substitute-path /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components /home/<your username>/tmp/esp-idf-v323/components

Espressif seems to use that working directory to create the Arduino Core, so by telling gdb to map their build folder (which is thankfully included in the linked binary) to your local IDF, VSCode will open the right .c-Files if you step into functions from the IDF, like SSL_new.

Result: Long story short – This call to mbedtls_calloc fails during SSL_new. It tries to allocate a single chunk of 16717 bytes with MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT capabilities. That should fit into the 33728 bytes reported initially, but SSL_new allocates quite a bit on its own before. And as you can see in ssl_tls.c, there comes a call to mbedtls_calloc after that for the ssl->out_buf which has roughly the same size. So I would assume that you need at least around 50k of memory free when entering SSL_new, and you need it ideally in big chunks of at least 17k.

So, I assume PSRAM won't help you out (due to MALLOC_CAP_INTERNAL), even though I thought enabling it would allow for more connections, maybe I need to double-check that.

Fun fact: OpenSSL and mbedtls would tell you exactly what is failing if logging for esp-idf could be enabled :unamused:

Apr 19 '20 00:04 fhessel

I've only inspected the code, not built idf myself to run it under the debugger, so I'm not sure which compile-time options in mbedtls are enabled, but maybe mbedtls_platform_set_calloc_free() could be used to override mbedtls_calloc and then for those two buffers assure you've pre-allocated them (before fragmentation set in), and return them at the right time?

Apr 19 '20 00:04 jackjansen

Short addition regarding PSRAM: On a WROVER board, you get:

Default Memory:       free size:  4256364 bytes   largest free block:  4194252
Internal 8bit Memory: free size:    62112 bytes   largest free block:    33728

So you have more memory, but not with the capabilities required by mbedtls. It might just be that other parts of the application used that additional memory, so there was more left for mbedtls to work with.

I've only inspected the code, not built idf myself to run it under the debugger.

You don't need to build it. If you use that set substitute-path you can debug through everything that's public.

but maybe mbedtls_platform_set_calloc_free() could be used to override mbedtls_calloc

That sound's interesting, I need to check whether that works with the configuration used for Arduino. If that works, however, it will be a bit hard to hook only into the right calls to mbedtls_calloc. And you do not gain that much. On a WROVER, there's only 32k more of fragmented memory that you could use. More interesting to me would be to change the required capabilities to allow usage of external memory. I mean, if that get's lost because you put your ESP into sleep mode, the TLS connection is dead anyway. The only problem with external memory is that it would in principle allow someone with hardware access to spoof on the session keys by hooking a signal analyzed to the memory bus.

Apr 19 '20 00:04 fhessel

Heap consumption of SSL_new()