LocalAI requires AVX2 on CPU — rpc error on older CPUs
LocalAI version: LocalAI version: v3.5.4 All image variants have the same problem (AIO, CPU only, CUDA 11/12)
Environment, CPU architecture, OS, and Version: VM: No OS: Arch Linux x86_64 Kernel: 6.16.8-arch3-1 CPU: Intel Xeon E5-2696 v2 (24) @ 3.500GHz GPU: NVIDIA GeForce RTX 2080 Ti Rev. A Memory: 14289MiB / 32066MiB
Describe the bug I've got the following docker-compose file, works on system that has AVX2, does not work on system which only has AVX.
services:
api:
container_name: localai
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
# For a specific version:
#image: localai/localai:v3.5.4-aio-cpu
# For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
# image: localai/localai:v3.5.4-aio-gpu-nvidia-cuda-11
# image: localai/localai:v3.5.4-aio-gpu-nvidia-cuda-12
# image: localai/localai:latest-aio-gpu-nvidia-cuda-11
# image: localai/localai:latest-aio-gpu-nvidia-cuda-12
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
runtime: nvidia
ports:
- 33866:8080
environment:
- DEBUG=true
- NVIDIA_VISIBLE_DEVICES=all
# ...
volumes:
- models:/models:cached
- backends:/backends
- user-backends:/usr/share/localai/backends
volumes:
models:
backends:
user-backends:
To Reproduce See docker-compose from above, run on a system with AVX2 and on a system without AVX2 With AVX2: Works Without AVX2: Won't work
Expected behavior Would be nice if it would work on systems without AVX2 CPU support.
Logs LocalAI:
localai | CPU info:
localai | model name : Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz
localai | flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1g
b rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow flexpriority ept
vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi
localai | CPU: AVX found OK
localai | CPU: no AVX2 found
localai | CPU: no AVX512 found
...
localai | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_context: CPU output buffer size = 0.49 MiB
localai | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_kv_cache: CPU KV buffer size = 896.00 MiB
localai | 9:01PM DBG GRPC(gpt-4-127.0.0.1:33445): stderr llama_kv_cache: size = 896.00 MiB ( 8192 cells,
28 layers, 1/1 seqs), K (f16): 448.00 MiB, V (f16): 448.00 MiB
localai | 9:01PM ERR Failed to load model gpt-4 with backend llama-cpp error="failed to load model with
internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF"
modelID=gpt-4
localai | 9:01PM DBG No choices in the response, skipping
dmesg:
[100539.252358] traps: grpcpp_sync_ser[1047414] trap invalid opcode ip:7f52c87a194b sp:7f5284ff1110 error:0 in llama-cpp-avx[7a194b,7f52c80d0000+1358000]
[233335.656413] traps: grpcpp_sync_ser[326615] trap invalid opcode ip:7f21765a191b sp:7f2125ff3110 error:0 in llama-cpp-fallback[7a191b,7f2175ed0000+1334000]
I've also tried to rebuild LocalAI & backends with multiple compiler flags (e.g. the ones from the FAQ, but that did not help either. https://localai.io/faq/#im-getting-a-sigill-error-whats-wrong I also tried CPU-only build to exclude any issues with GPU, that did not help either.
Subject: Update: Main local-ai binary also contains AVX2 instructions, causing crash on AVX-only CPUs
Hello,
Following up on my previous post, I've uncovered a deeper issue.
After correcting the backend/cpp/llama-cpp/run.sh script to select the correct llama-cpp-fallback backend, the application still crashed on my AVX-only CPU.
New Finding: The main local-ai binary requires AVX2
I investigated the main application binary itself (/local-ai) and discovered that it was also compiled with AVX2 instructions. This is the ultimate root cause of the crash, as the application fails with an "illegal instruction" error before it can even fully load a backend.
Diagnosis
I confirmed this by running the following command inside the container, which found AVX2-specific instructions (using ymm registers) in the main binary:
objdump -d /local-ai | grep 'ymm' && echo "AVX2 instructions found in local-ai binary"
The output confirmed the presence of AVX2 instructions.
Conclusion & Recommendation
The problem is twofold:
- The
llama-cpp-avxbackend binary is incorrectly compiled with AVX2 instructions. - More critically, the main
local-aiapplication binary is also compiled with AVX2 instructions, making it incompatible with a significant range of CPUs that only support AVX. To fully resolve this, the core application's build process needs to be modified. A build of thelocal-aibinary without AVX2 optimizations is required to support hardware that lacks this feature set. This would be analogous to the existingllama-cpp-fallbackbackend. I hope this comprehensive diagnosis is helpful in creating a permanent fix.
I have the same problem with a E5-2650V2 CPU. Would be nice to have localai on older CPUs.