Sidekick BUG: Gemma 3 12B - no response, only Retry

Describe the bug It does not work with Gemma 3 12B - other apps does not problem with it

To Reproduce Steps to reproduce the behavior:

Select model Gemma 3 12B
Put some question
Wait

Expected behavior response is only Retry and after click on it still the same

Screenshots

Desktop (please complete the following information):

OS: macOS 15.5
Version 1.0.0-rc.12

Additional context koboldcpp interacts with this model without any problems:

Jun 12 '25 18:06 cernyjan

@cernyjan

Thanks for reaching out –– let me look into this.

What specific GGUF quant are you using? I'm inferring it's from Bartowski, but could you provide me with a download URL?

Thanks 🙏

Jun 13 '25 07:06 johnbean393

@johnbean393 sure, this one: https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-Q5_K_M.gguf

Thank you.

Jun 13 '25 14:06 cernyjan

@cernyjan

I tried using a Gemma3-12B 4-bit quant from Bartowski, and inference seems to be normal. I am using an M2 Max with 32GB of unified memory.

My suspicion is that since kobold.cpp applies a default context length of 4096 tokens, it is able to fit the model plus context in GPU memory.

Therefore, let's try lowering the context length in Sidekick. Could you perform these debugging steps?

Lower the context length to 4096 in Settings

Launch Sidekick to load the model, then go to http://localhost:4579 to use the llama.cpp web UI directly. This will help isolate whether the model / inference framework is at fault, or if Sidekick is not streaming the response properly.
If step 2 fails, try selecting a smaller model such as Qwen3-1.7B. This will help isolate whether you have any hardware limitations like insufficient VRAM.

Jun 13 '25 17:06 johnbean393

@johnbean393 After step # 1 on URI http://localhost:4579/ it works:

but if I tried on Sidekick GUI it stops showing Retry as before, but it starting to crash with application end without any response. Only loading spinner for awhile at program start and after that immediately closed, so not possible to delete message or change back context length, or even select different model.

Anyway I have M4 with 16GB.

Jun 13 '25 20:06 cernyjan