Cannot attach images with any model.
Describe the bug
Cannot attach images with any model.
Expected behavior
From reading this page, I expected to be able to attach images in messages sent to any "vision" model. I tried a whole bunch of them, but in all cases, Alpaca says "Image recognition is only available on specific models."
Debugging information I am running the Flatpak version, with the managed Ollama version.
> flatpak run com.jeffser.Alpaca
/app/lib/python3.12/site-packages/pydbus/registration.py:130: DeprecationWarning: Gio.DBusConnection.register_object is deprecated
ids = [bus.con.register_object(path, interface, wrapper.call_method, None, None) for interface in interfaces]
INFO [main.py | main] Alpaca version: 6.0.5
INFO [instance_manager.py | start] Starting Alpaca's Ollama instance...
INFO [instance_manager.py | start] Started Alpaca's Ollama instance
2025/05/04 22:46:46 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:1 HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/lonj/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:1 http_proxy: https_proxy: no_proxy:]"
INFO [instance_manager.py | start] client version is 0.6.7
time=2025-05-04T22:46:46.141+02:00 level=INFO source=images.go:458 msg="total blobs: 36"
time=2025-05-04T22:46:46.142+02:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-05-04T22:46:46.142+02:00 level=INFO source=routes.go:1300 msg="Listening on [::]:11435 (version 0.6.7)"
time=2025-05-04T22:46:46.142+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-04T22:46:46.152+02:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-05-04T22:46:46.152+02:00 level=WARN source=amd_linux.go:309 msg="amdgpu too old gfx803" gpu=0
time=2025-05-04T22:46:46.152+02:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected"
time=2025-05-04T22:46:46.152+02:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-05-04T22:46:46.152+02:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.3 GiB" available="12.9 GiB"
[GIN] 2025/05/04 - 22:46:46 | 200 | 2.030354ms | 127.0.0.1 | GET "/api/tags"
time=2025-05-04T22:46:46.170+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.175+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.179+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.180+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.181+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/04 - 22:46:46 | 200 | 21.953705ms | 127.0.0.1 | POST "/api/show"
time=2025-05-04T22:46:46.189+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.192+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/04 - 22:46:46 | 200 | 26.851194ms | 127.0.0.1 | POST "/api/show"
time=2025-05-04T22:46:46.194+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/04 - 22:46:46 | 200 | 36.925483ms | 127.0.0.1 | POST "/api/show"
time=2025-05-04T22:46:46.217+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.219+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.226+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.231+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.240+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/04 - 22:46:46 | 200 | 67.946048ms | 127.0.0.1 | POST "/api/show"
time=2025-05-04T22:46:46.252+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-04T22:46:46.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/04 - 22:46:46 | 200 | 81.234067ms | 127.0.0.1 | POST "/api/show"
time=2025-05-04T22:46:46.262+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/04 - 22:46:46 | 200 | 93.761884ms | 127.0.0.1 | POST "/api/show"
time=2025-05-04T22:46:46.264+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/04 - 22:46:46 | 200 | 95.508274ms | 127.0.0.1 | POST "/api/show"
[GIN] 2025/05/04 - 22:46:54 | 200 | 829.472µs | 127.0.0.1 | GET "/api/tags"
INFO [window.py | show_toast] Image recognition is only available on specific models
Which model(s) did you try? Gemma 3, for example, only supports vision with 4B parameters and higher. The 1B version does not support vision.
LLaMa3.2 Vision 11B, LLaVa 7B, and Moondream 1.8B. I will try Gemma 3 4B.
Ok, Gemma 3 4B works.
LLaMa3.2 Vision 11B, LLaVa 7B, and Moondream 1.8B. I will try Gemma 3 4B
Weird, those should totally work as well... thanks for the report. I'll try to look into this when I have some time.
Okay, apparently, Gemma 3 is one of the only models that have "vision" in their categories list two separate times. I'm investigating whether that's it.
Okay, that's not it. The information is requested from the Ollama instance at runtime and not hard-coded.
I can confirm the issue. When I select the file to attach to the prompt, Alpaca reports only these files as compatible:
This happens with every model that OpenRouter gives access to.
Same👍
Just want to add that I tried medgemma (based on gemma 3), and at least Unsloth GGUF (https://huggingface.co/unsloth/medgemma-4b-it-GGUF) is not recognised as vision, even if the model should be capable of it
For those who are really troubled by this bug, as a temporary workaround it's possible to attach an image with Gemma3 as the selected model, then switch the selected model to your desired vision model before sending the message.