FrenzyBiscuit
FrenzyBiscuit
### What is the issue? I am on the 0.5.0 release (which links to 0.4.8-rc0) and using Qwen 2.5 32b Q5 with 32k context and flash attention with q8_0 KV...
### What is the issue? I have the following setup: 7950x3d (AMD iGPU) 3090 + 2080ti When using Ollama with open-webui the GPU (3090) gets used BRIEFLY. It starts using...
### Describe the bug I am using AIO with the following: --env NEXTCLOUD_ENABLE_NVIDIA_GPU=true I can also confirm docker works correctly with nvidia, as the below command works: `docker run --gpus...
### Environment 🐧 Linux ### System Fedora 42 ### Version 1.12.13 ### Desktop Information NodeJS version 22.14.0 Python 3.12 is installed with a venv for tabbyapi, system default is 3.13....
### Have you searched for similar requests? Yes ### Is your feature request related to a problem? If so, please describe. Can we get support for a reranker model for...
### Have you searched for similar requests? No ### Is your feature request related to a problem? If so, please describe. Currently you can only upload one file at a...
### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Please consider adding an option for individual users to use full context mode...
### Your current environment . ### Model Input Dumps _No response_ ### 🐛 Describe the bug Aphrodite crashes when this setting is enabled in sillytavern and someone uses text completion....
### Your current environment Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Gentoo Linux (x86_64)...
### Your current environment In a single user mode with a single request, chunked prefill will work on FLASHINFER and I am able to hit 160k FP8 context. When multiple...