llama-server with cpu device is not working in docker image
services:
tabby:
restart: always
image: tabbyml/tabby
entrypoint: /opt/tabby/bin/tabby-cpu
command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct
volumes:
- ".data/tabby:/data"
ports:
- 8080:8080
which is document here https://tabby.tabbyml.com/docs/quick-start/installation/docker-compose/ wont work
tabby-1 | 2024-07-13T12:53:36.624504Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code 127
tabby-1 | 2024-07-13T12:53:36.624528Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: /opt/tabby/bin/llama-server: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
Originally posted by @b-reich in https://github.com/TabbyML/tabby/issues/2082#issuecomment-2226889985
Hi, thanks for reporting the issue. For a workaround, I recommend use the linux binary distribution directly: https://tabby.tabbyml.com/docs/quick-start/installation/linux/#download-the-release
I also encountered the same error😭
The issue seems to be related to the llama-server, the LD_LIBRARY_PATH should be updated to something like /usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH also the cuda version should be updated to 12.5 @wsxiaoys do you want me to submit a PR?
Submitted the pull request #2711 . In the meanwhile you can use my temporary image 0x4139/tabby-cuda (cuda 12.2) or tabbyml/tabby (cuda 11.7) with the LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH environment path
If you're using docker compose you can use the following snippet:
version: '3.8'
services:
tabby:
restart: always
image: tabbyml/tabby
command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
volumes:
- "$HOME/.tabby:/data"
ports:
- 8080:8080
environment:
- PATH=/usr/local/cuda/bin:$PATH
- LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
If you're using docker you can use the following snippet:
docker run -it --gpus all \
-p 8080:8080 \
-v $HOME/.tabby:/data \
-e PATH=/usr/local/cuda/bin:$PATH \
-e LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH \
tabbyml/tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
@b-reich @kannae97 This should solve your issue.
@0x4139 nope this is not my issue. I want to run it without a GPU. Just CPU Mode.
Same error here.
For those experiencing the issue, please refer to the comment at https://github.com/TabbyML/tabby/issues/2634#issuecomment-2226992984 to see if it resolves the problem for you. If it doesn't, feel free to share your experiences. Thank you!
@0x4139 nope this is not my issue. I want to run it without a GPU. Just CPU Mode.
The issues are related, the binary won't start even in cpu mode due to the fact that the cuda libraries are not linked. Just tested it now, and it works also in cpu mode.
@0x4139 ur docker command and compose uses different images.
@0x4139 ur docker command and compose uses different images
That is the point, i mentioned that i created a temporary image with the LD path fix, that works both on CPU and GPU. If the image works for you as well, probably @wsxiaoys will merge the fix.
I'm experiencing similar issue, but for me Docker image works fine, it's Linux release that doesn't work.
Error:
⠼ 1.124 s Starting...2024-08-02T12:30:38.407959Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code 127
2024-08-02T12:30:38.408050Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: /path/to/tabby/llama-server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
I'm using command: ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
Adding env as suggested https://github.com/TabbyML/tabby/issues/2634#issuecomment-2244990918 doesn't help: LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
EndeavourOS, tabby 0.14.0 with NVIDIA GeForce RTX 2060 and CUDA 12.5
P.S. Is it fine to discuss it here or should I open new issue?
I'm experiencing similar issue, but for me Docker image works fine, it's Linux release that doesn't work.
Error:
⠼ 1.124 s Starting...2024-08-02T12:30:38.407959Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code 127 2024-08-02T12:30:38.408050Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: /path/to/tabby/llama-server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directoryI'm using command:
./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cudaAdding env as suggested #2634 (comment) doesn't help:
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cudaEndeavourOS, tabby
0.14.0withNVIDIA GeForce RTX 2060and CUDA12.5P.S. Is it fine to discuss it here or should I open new issue?
Be sure you have installed cuda-development-toolkit for your linux distribution.
Be sure you have installed cuda-development-toolkit for your linux distribution.
Thank that fixed it. Now I'm getting:
⠴ 2.006 s Starting...2024-08-08T18:29:55.586365Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
Seems similar to this https://github.com/TabbyML/tabby/issues/2803
Be sure you have installed cuda-development-toolkit for your linux distribution.
Thank that fixed it. Now I'm getting:
⠴ 2.006 s Starting...2024-08-08T18:29:55.586365Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1Seems similar to this #2803
Could you provide a broader view of the logs, as well as your tabby configuration?
Could you provide a broader view of the logs, as well as your tabby configuration?
I'm on EndeavourOS and I've downloaded https://github.com/TabbyML/tabby/releases/tag/v0.14.0 / tabby_x86_64-manylinux2014-cuda122
Command ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda produces output:
⠇ 2.257 s Starting...2024-08-09T20:00:30.651731Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠼ 4.340 s Starting...2024-08-09T20:00:32.745170Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠙ 6.502 s Starting...2024-08-09T20:00:34.864970Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠧ 8.584 s Starting...2024-08-09T20:00:36.966576Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠸ 10.666 s Starting...2024-08-09T20:00:39.041895Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠙ 12.108 s Starting...^C
It just goes on forever.
Here's nvidia-smi output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02 Driver Version: 555.58.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2060 Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P8 5W / 90W | 7MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1257 G /usr/lib/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
Where can I find tabby configuration so I can provide it? I looked at ~/.tabby, but didn't see much there.
Could you provide a broader view of the logs, as well as your tabby configuration?
I'm on EndeavourOS and I've downloaded https://github.com/TabbyML/tabby/releases/tag/v0.14.0 /
tabby_x86_64-manylinux2014-cuda122Command
./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cudaproduces output:⠇ 2.257 s Starting...2024-08-09T20:00:30.651731Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1 ⠼ 4.340 s Starting...2024-08-09T20:00:32.745170Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1 ⠙ 6.502 s Starting...2024-08-09T20:00:34.864970Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1 ⠧ 8.584 s Starting...2024-08-09T20:00:36.966576Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1 ⠸ 10.666 s Starting...2024-08-09T20:00:39.041895Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1 ⠙ 12.108 s Starting...^CIt just goes on forever.
Here's
nvidia-smioutput:+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.58.02 Driver Version: 555.58.02 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 2060 Off | 00000000:01:00.0 Off | N/A | | N/A 52C P8 5W / 90W | 7MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1257 G /usr/lib/Xorg 4MiB | +-----------------------------------------------------------------------------------------+Where can I find tabby configuration so I can provide it? I looked at
~/.tabby, but didn't see much there.
Seems to be related to some flags passed to the llama-server embedding process, for the time being i think you can revert to this version https://github.com/TabbyML/tabby/releases/download/v0.13.1/tabby_x86_64-manylinux2014-cuda122.zip, it should fix your issue.
tabby build the image with cuda by default,
https://github.com/TabbyML/tabby/blob/48d9c084cc80d20bd439102cc0768920f6680e91/docker/Dockerfile.cuda#L42
that's why llama-cpp-server looks for the libcuda, and failed to start if no GPU existed.
the libcuda is mounted at runtime by nvidia-container-runtime.
maybe we need a cpu Dockerfile to build the CPU image? it can also largely reduce the image size without the cuda dependencies.
WDYT @wsxiaoys
for the time being i think you can revert to this version
Thank you that fixed the problem.
v15 was released but I get the same error on it. Is there any issue/pr that related to it? So I can monitor when it's safe to upgrade.
There is a merge request here #2711