BitNet icon indicating copy to clipboard operation
BitNet copied to clipboard

ARM TL1 memory error: "double free or corruption (!prev)" and "free(): invalid next size (normal)"

Open xxxxyu opened this issue 1 year ago • 2 comments

I encountered this on Orange Pi 5 Plus (RK3588) with Ubuntu 22.04 LTS. The compilation environments (in a Conda env) are:

  • Python 3.9.21
  • CMake 3.31.2
  • Clang 18.1.8

The compilation went successfully, but both run_inference.py (llama-cli) and e2e_benchmark.py (llama-bench) exited with some memory errors. Specifically in the benchmark case, there are still benchmark outputs, but I'm not sure will this affect the measured performance.

Note that there are different errors when benchmarking prefilling (p>0, n=0) and decoding (p=0, n>0).

  • "double free or corruption (!prev)" in prefilling
  • "free(): invalid next size (normal)" in decoding

Here are the corresponding commands and outputs:

(bitnet-cpp) orangepi@orangepi5plus:~/repos/BitNet$ python utils/e2e_benchmark.py -m ~/models/bitnet_b1_58-3B/ggml-model-tl1.gguf -n 0 -p 128 -t 4
| model                          |       size |     params | backend    | threads | n_batch |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | ------------: | -------------------: |
| bitnet 3B TL1                  | 965.22 MiB |     3.32 B | CPU        |       4 |       1 |         pp128 |         13.88 ± 0.07 |
double free or corruption (!prev)
ERROR:root:Error occurred while running command: Command '['/home/orangepi/repos/BitNet/build/bin/llama-bench', '-m', '/home/orangepi/models/bitnet_b1_58-3B/ggml-model-tl1.gguf', '-n', '0', '-ngl', '0', '-b', '1', '-t', '4', '-p', '128', '-r', '5']' died with <Signals.SIGABRT: 6>.
(bitnet-cpp) orangepi@orangepi5plus:~/repos/BitNet$ python utils/e2e_benchmark.py -m ~/models/bitnet_b1_58-3B/ggml-model-tl1.gguf -n 32 -p 0 -t 4
| model                          |       size |     params | backend    | threads | n_batch |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | ------------: | -------------------: |
| bitnet 3B TL1                  | 965.22 MiB |     3.32 B | CPU        |       4 |       1 |          tg32 |         14.43 ± 0.19 |
free(): invalid next size (normal)
ERROR:root:Error occurred while running command: Command '['/home/orangepi/repos/BitNet/build/bin/llama-bench', '-m', '/home/orangepi/models/bitnet_b1_58-3B/ggml-model-tl1.gguf', '-n', '32', '-ngl', '0', '-b', '1', '-t', '4', '-p', '0', '-r', '5']' died with <Signals.SIGABRT: 6>.

This should be a similar issue to https://github.com/microsoft/BitNet/issues/143, but I'm not quite sure.

xxxxyu avatar Dec 26 '24 05:12 xxxxyu

Kernel information:

Linux orangepi5plus 6.1.43-rockchip-rk3588 #1.2.0 SMP Thu Nov 21 12:08:24 CST 2024 aarch64 aarch64 aarch64 GNU/Linux

xxxxyu avatar Dec 26 '24 05:12 xxxxyu

@xxxxyu , I implemented the solution of this problem.(#164)

y-vectorfield avatar Feb 28 '25 07:02 y-vectorfield