esp32-llm
esp32-llm copied to clipboard
Running a LLM on the ESP32
Running a LLM on the ESP32
https://youtu.be/ILh38jd0GNU
Summary
I wanted to see if it was possible to run a Large Language Model (LLM) on the ESP32. Surprisingly it is possible, though probably not very useful.
The "Large" Language Model used is actually quite small. It is a 260K parameter tinyllamas checkpoint trained on the tiny stories dataset.
The LLM implementation is done using llama.2c with minor optimizations to make it run faster on the ESP32.
Hardware
LLMs require a great deal of memory. Even this small one still requires 1MB of RAM. I used the LILYGO T-Camera S3 ESP32-S3 because it has 8MB of embedded PSRAM and a screen.
Optimizing Llama2.c for the ESP32
With the following changes to llama2.c, I am able to achieve 19.13 tok/s:
- Utilizing both cores of the ESP32 during math heavy operations.
- Utilizing some special dot product functions from the ESP-DSP library that are designed for the ESP32-S3. These functions utilize some of the few SIMD instructions the ESP32-S3 has.
- Maxing out CPU speed to 240 MHz and PSRAM speed to 80MHZ and increasing the instruction cache size.
Setup
This requires the ESP-IDF toolchain to be installed
idf.py build
idf.py -p /dev/{DEVICE_PORT} flash