lexasub
lexasub
Hello, i create pull request on it: https://github.com/BookStackApp/BookStack/pull/4792
The crash occurs specifically when using X2Go's X server (likely Xvfb or Xorg with limited extensions), regardless of whether I open local or remote projects. Reproduction steps: Connect via X2Go...
@foldl what about gguf support?)
@charleswg , no, i get load_tensors: offloading 64 repeating layers to GPU (i add another pc in lan, but inference also very slow and not load cpu or gpu) load_tensors:...
which information(may be perf analog for such task) i may tell you? [file.txt](https://github.com/user-attachments/files/18527456/file.txt)
also i test on one host (without rpc) on DeepSeek-R1-Distill-Qwen-32B-TQ2_0 (11Gb), but is use only cpu load_tensors: offloading 64 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors:...
@charleswg i try DeepSeek-R1-Distill-Llama-8B-Q8_0 - it's good, problem in Deepseek Qwen, may be problem in Qwen family? how to fix?
@charleswg , hello another time! Now i pull main branch and rebuild llama.cpp, but ./build/bin/llama-server --host 127.0.0.1 --port 8080 -m DeepSeek-R1-Distill-Qwen-32B-TQ2_0.gguf -ngl 9999 #model load very fast, but working on...
clean build with ```cmake -DGGML_USE_CPU=OFF -GNinja -DGGML_CUDA=ON .. ; ninja -j20``` examples of loading llms, by ```./llama-server --host 192.168.2.109 --port 8080 -m DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf -ngl 9999 ``` [llama.txt](https://github.com/user-attachments/files/18623638/llama.txt) ```./llama-server --host 192.168.2.109...
> ./llama-server --host 192.168.2.109 --port 8080 -fa -c 32000 \ > -m unsloth_DeepSeek-R1-Distill-Qwen-32B/DeepSeek-R1-Distill-Qwen-32B-Q8_0.gguf \ > --rpc 10.2.0.5:1000,10.2.0.5:1001,10.2.0.5:1002,10.2.0.5:1003 \ > -ngl 60 --tensor_split 16\15\15\7\7 -b 512 -ub 256 ok, i trying...