BUG: Gemma 3 12B - no response, only Retry
Describe the bug It does not work with Gemma 3 12B - other apps does not problem with it
To Reproduce Steps to reproduce the behavior:
- Select model Gemma 3 12B
- Put some question
- Wait
Expected behavior response is only Retry and after click on it still the same
Screenshots
Desktop (please complete the following information):
- OS: macOS 15.5
- Version 1.0.0-rc.12
Additional context
koboldcpp interacts with this model without any problems:
@cernyjan
Thanks for reaching out –– let me look into this.
What specific GGUF quant are you using? I'm inferring it's from Bartowski, but could you provide me with a download URL?
Thanks 🙏
@johnbean393 sure, this one: https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-Q5_K_M.gguf
Thank you.
@cernyjan
I tried using a Gemma3-12B 4-bit quant from Bartowski, and inference seems to be normal. I am using an M2 Max with 32GB of unified memory.
My suspicion is that since kobold.cpp applies a default context length of 4096 tokens, it is able to fit the model plus context in GPU memory.
Therefore, let's try lowering the context length in Sidekick. Could you perform these debugging steps?
- Lower the context length to 4096 in Settings
- Launch Sidekick to load the model, then go to http://localhost:4579 to use the
llama.cppweb UI directly. This will help isolate whether the model / inference framework is at fault, or if Sidekick is not streaming the response properly. - If step 2 fails, try selecting a smaller model such as Qwen3-1.7B. This will help isolate whether you have any hardware limitations like insufficient VRAM.
@johnbean393
After step # 1
on URI http://localhost:4579/ it works:
but if I tried on Sidekick GUI it stops showing Retry as before, but it starting to crash with application end without any response. Only loading spinner for awhile at program start and after that immediately closed, so not possible to delete message or change back context length, or even select different model.
Anyway I have M4 with 16GB.