Patrick Devine comments

Results 426 comments of


                                            Patrick Devine

qwen 2.5 coder stuck "Stopping"

@MHugonKaliop you can change `OLLAMA_KEEP_ALIVE=-1m` to prevent the model from ever being unloaded. The reason why it's probably in the `Stopping...` state is that it is trying to unload the...

Beachballs while typing

I've found opening a new chat works as well (w/o having to close/reopen the app).

Loading time of mistral-small3.1 is too long

You can also use `ollama ps` to see if some of it is being loaded into system memory instead of onto the GPU. Unfortunately your GPU has just _barely_ enough...

It seems that the new KV cache quantization feature is incorrectly allocating resources.

Going to close this as a dupe of the other issue.

" Error: json: cannot unmarshal array into Go struct field Params.eos_token_id of type int " while importing llama 3.1 8B safetensor model from huggingface

I'm going to go ahead and close this. You shouldn't need to specify the `TEMPLATE` as it should get autodetected.

Error: pull model manifest: Get "https://registry.ollama.ai/v2/library/llama3.1/manifests/latest": net/http: TLS handshake timeout

I'm going to close this as a dupe of #3786

Llama3.1 70b-instruct-q4_1 buggy

I ended up rebuilding the q4_1 weights and still ran into issues. In talking with the Llama team, the model is really sensitive to certain quantizations, although they didn't give...

Llama3.1 70b-instruct-q4_1 buggy

We ended up removing the quantization. I think there was probably an issue also w/ the kv cache. There are some changes coming to improve kv cache performance and I'm...

support LLaMA-Omni

Going to close this as a dupe.

Getting Unsupported architecture error When Importing Llama-vision.

The safetensors architectures that are currently supported are: - Llama 2 and 3 (not the vision models yet unfortunately) - Gemma 1 and 2 - Bert - Mixtral - Phi3