FrenzyBiscuit issues

Results 13 issues of


                                            FrenzyBiscuit

Low GPU usage on second GPU

### What is the issue? I am on the 0.5.0 release (which links to 0.4.8-rc0) and using Qwen 2.5 32b Q5 with 32k context and flash attention with q8_0 KV...

bug

Not using GPU

### What is the issue? I have the following setup: 7950x3d (AMD iGPU) 3090 + 2080ti When using Ollama with open-webui the GPU (3090) gets used BRIEFLY. It starts using...

bug

Can't use hardware transcoding with AIO/Nvidia?

### Describe the bug I am using AIO with the following: --env NEXTCLOUD_ENABLE_NVIDIA_GPU=true I can also confirm docker works correctly with nvidia, as the below command works: `docker run --gpus...

needs triage

[BUG] Context issues with Fedora 42

### Environment 🐧 Linux ### System Fedora 42 ### Version 1.12.13 ### Desktop Information NodeJS version 22.14.0 Python 3.12 is installed with a venv for tabbyapi, system default is 3.13....

🐛 Bug

🐧 Linux

[FEATURE_REQUEST] Reranking support

### Have you searched for similar requests? Yes ### Is your feature request related to a problem? If so, please describe. Can we get support for a reranker model for...

🦄 Feature Request

Allow more then one attachment per reply

### Have you searched for similar requests? No ### Is your feature request related to a problem? If so, please describe. Currently you can only upload one file at a...

🦄 Feature Request

✅ Done (staging)

FrenzyBiscuit

Low GPU usage on second GPU

Not using GPU

Can't use hardware transcoding with AIO/Nvidia?

[BUG] Context issues with Fedora 42

[FEATURE_REQUEST] Reranking support

Allow more then one attachment per reply

feat: Full context mode (RAG) per-user, per-chat

[Bug]: LogProb crashes the server

[Bug]:Can't use speculative decoding

[Bug]: FLASHINFER - chunked_prefill crashes when multiple concurrent requests happen