ARM TL1 memory error: "double free or corruption (!prev)" and "free(): invalid next size (normal)"
I encountered this on Orange Pi 5 Plus (RK3588) with Ubuntu 22.04 LTS. The compilation environments (in a Conda env) are:
- Python 3.9.21
- CMake 3.31.2
- Clang 18.1.8
The compilation went successfully, but both run_inference.py (llama-cli) and e2e_benchmark.py (llama-bench) exited with some memory errors. Specifically in the benchmark case, there are still benchmark outputs, but I'm not sure will this affect the measured performance.
Note that there are different errors when benchmarking prefilling (p>0, n=0) and decoding (p=0, n>0).
- "double free or corruption (!prev)" in prefilling
- "free(): invalid next size (normal)" in decoding
Here are the corresponding commands and outputs:
(bitnet-cpp) orangepi@orangepi5plus:~/repos/BitNet$ python utils/e2e_benchmark.py -m ~/models/bitnet_b1_58-3B/ggml-model-tl1.gguf -n 0 -p 128 -t 4
| model | size | params | backend | threads | n_batch | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | ------------: | -------------------: |
| bitnet 3B TL1 | 965.22 MiB | 3.32 B | CPU | 4 | 1 | pp128 | 13.88 ± 0.07 |
double free or corruption (!prev)
ERROR:root:Error occurred while running command: Command '['/home/orangepi/repos/BitNet/build/bin/llama-bench', '-m', '/home/orangepi/models/bitnet_b1_58-3B/ggml-model-tl1.gguf', '-n', '0', '-ngl', '0', '-b', '1', '-t', '4', '-p', '128', '-r', '5']' died with <Signals.SIGABRT: 6>.
(bitnet-cpp) orangepi@orangepi5plus:~/repos/BitNet$ python utils/e2e_benchmark.py -m ~/models/bitnet_b1_58-3B/ggml-model-tl1.gguf -n 32 -p 0 -t 4
| model | size | params | backend | threads | n_batch | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | ------------: | -------------------: |
| bitnet 3B TL1 | 965.22 MiB | 3.32 B | CPU | 4 | 1 | tg32 | 14.43 ± 0.19 |
free(): invalid next size (normal)
ERROR:root:Error occurred while running command: Command '['/home/orangepi/repos/BitNet/build/bin/llama-bench', '-m', '/home/orangepi/models/bitnet_b1_58-3B/ggml-model-tl1.gguf', '-n', '32', '-ngl', '0', '-b', '1', '-t', '4', '-p', '0', '-r', '5']' died with <Signals.SIGABRT: 6>.
This should be a similar issue to https://github.com/microsoft/BitNet/issues/143, but I'm not quite sure.
Kernel information:
Linux orangepi5plus 6.1.43-rockchip-rk3588 #1.2.0 SMP Thu Nov 21 12:08:24 CST 2024 aarch64 aarch64 aarch64 GNU/Linux
@xxxxyu , I implemented the solution of this problem.(#164)