llama.cpp
llama.cpp copied to clipboard
Eval bug: Segmentation fault with Docker ROCm image "full-rocm"
Name and Version
Docker Image: ghcr.io/ggerganov/llama.cpp:full-rocm 4fbeb701689e
root@5de0b21ea186:/app# ./llama-cli --version
version: 0 (unknown)
built with AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac) for x86_64-unknown-linux-gnu
Operating systems
Linux
GGML backends
Unknown (ROCm?)
Hardware
CPU: AMD Ryzen 7 7700 GPU: AMD 7800 XT
Models
https://huggingface.co/chatpdflocal/Qwen2.5.1-Coder-14B-Instruct-GGUF/blob/main/Qwen2.5.1-Coder-14B-Instruct-Q4_K_M.gguf
Problem description & steps to reproduce
llama-server crashes with a segmentation fault a few seconds after startup. I'm not sure about the used GGML backend. I failed to find out which backend is used by llama.cpp with ROCm.
Docker container log: docker-llama.log.
docker-compose.yml
llama:
container_name: llama
image: ghcr.io/ggerganov/llama.cpp:full-rocm
restart: no
privileged: true
devices:
- /dev/dri/card0:/dev/dri/card0
- /dev/kfd:/dev/kfd
ports:
- 127.0.0.1:8080:8080
environment:
- 'HSA_OVERRIDE_GFX_VERSION=11.0.0'
volumes:
- /opt/docker/volumes/llama-root/volume-models:/models
command: "--server --model /models/Qwen2.5.1-Coder-14B-Instruct-Q4_K_M.gguf --ctx-size 8192 --port 8080 --parallel 1 --threads -1 --mlock --flash-attn --gpu-layers 100"
The Ollama ROCm Docker image runs fine:
ollama:
container_name: ollama
image: ollama/ollama:rocm
restart: no
privileged: true
devices:
- /dev/dri/card0:/dev/dri/card0
- /dev/kfd:/dev/kfd
ports:
- 127.0.0.1:11434:11434
environment:
- 'HSA_OVERRIDE_GFX_VERSION=11.0.0'
volumes:
- /opt/docker/volumes/ollama:/root/.ollama
First Bad Commit
No response
Relevant log output
Docker container logs: See Attachment "docker-llama.log" above