Sidekick icon indicating copy to clipboard operation
Sidekick copied to clipboard

BUG: Gemma 3 12B - no response, only Retry

Open cernyjan opened this issue 10 months ago • 4 comments

Describe the bug It does not work with Gemma 3 12B - other apps does not problem with it

To Reproduce Steps to reproduce the behavior:

  1. Select model Gemma 3 12B
  2. Put some question
  3. Wait

Expected behavior response is only Retry and after click on it still the same

Screenshots Image

Desktop (please complete the following information):

  • OS: macOS 15.5
  • Version 1.0.0-rc.12

Additional context koboldcpp interacts with this model without any problems: Image

cernyjan avatar Jun 12 '25 18:06 cernyjan

@cernyjan

Thanks for reaching out –– let me look into this.

What specific GGUF quant are you using? I'm inferring it's from Bartowski, but could you provide me with a download URL?

Thanks 🙏

johnbean393 avatar Jun 13 '25 07:06 johnbean393

@johnbean393 sure, this one: https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/blob/main/google_gemma-3-12b-it-Q5_K_M.gguf

Thank you.

cernyjan avatar Jun 13 '25 14:06 cernyjan

@cernyjan

I tried using a Gemma3-12B 4-bit quant from Bartowski, and inference seems to be normal. I am using an M2 Max with 32GB of unified memory.

Image

My suspicion is that since kobold.cpp applies a default context length of 4096 tokens, it is able to fit the model plus context in GPU memory.

Therefore, let's try lowering the context length in Sidekick. Could you perform these debugging steps?

  1. Lower the context length to 4096 in Settings
Image
  1. Launch Sidekick to load the model, then go to http://localhost:4579 to use the llama.cpp web UI directly. This will help isolate whether the model / inference framework is at fault, or if Sidekick is not streaming the response properly.
  2. If step 2 fails, try selecting a smaller model such as Qwen3-1.7B. This will help isolate whether you have any hardware limitations like insufficient VRAM.

johnbean393 avatar Jun 13 '25 17:06 johnbean393

@johnbean393 After step # 1 on URI http://localhost:4579/ it works: Image Image

but if I tried on Sidekick GUI it stops showing Retry as before, but it starting to crash with application end without any response. Only loading spinner for awhile at program start and after that immediately closed, so not possible to delete message or change back context length, or even select different model.

Anyway I have M4 with 16GB.

cernyjan avatar Jun 13 '25 20:06 cernyjan