matteo
matteo
I tried sending data via the SMI interface to the cariboulite. By reading the registers on the AT86RF215 modem I see that the TX interface (from rpi to cariboulite) is...
now it supports cyclic transfers without packet loss
Hello. The series of patches is to enable the TX feature. * fpga firmware fix * software support * test app * examples I had to reduce the fpga fifo...
This adds a environment variable for launching llama.cpp with unified memory on CUDA. This is useful when the model barely fits in VRAM and inference causes OOM errors. In that...
### What happened? ### Problem Some models produce a corrupted output when offloading to multiple CUDA GPUs. The problem disappears when offloading to a single GPU or using CPU only....
GLM4-0414 models were using the wrong legacy template, leading to a missing [gMASK] preamble. The old code was returning `LLM_CHAT_TEMPLATE_GLMEDGE` As a workaround you needed to launch `llama-server` with `--chat-template...
### What is the issue? The GLM4 model uses the wrong template, causing a performance degradation. Here is the relevant pull request in llama.cpp: https://github.com/ggml-org/llama.cpp/pull/13099 The reason is that the...
### Name and Version version: 5327 (27ebfcac) built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu ### Operating systems Linux ### GGML backends CUDA ### Hardware nvidia ### Models all ###...
This PR implements handling additional jinja parameters. Used for example to set enable_thinking in Qwen3 models. The official template is still partially compatible. I modified it to use only supported...
### Name and Version 611aa914ef4231fab5d1ad04773c42e119ae2d2e ### Operating systems Linux ### GGML backends CUDA ### Hardware NVIDIA ### Models Qwen3 ### Problem description & steps to reproduce I don't know if...