Patrick Devine

Results 426 comments of Patrick Devine

> @pdevine I tried Gemma with 0.1.33-rc5 version. It works now but is slow. I see in the server logs that not all the layers are sent to the GPU....

@necro304 Does this happen with all models, or only llama2? What are the specs for you machine, what version of ollama are you running (use `ollama --version`), and what version...

I believe this ended up being fixed a while ago. The most recent version of ollama is 0.1.28. The llama2 model that you have should still be the latest. I'm...

Hey @knoopx You can actually do this by calling `curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'` (not with `-1` which will always leave the model loaded). That will immediately unload...

We started with Linux/MacOS and I had shoved a colon into the name not realizing that NTFS didn't support colons in file names. I also didn't anticipate so many people...

Sorry for the slow response guys. There's actually an [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-set-them-to-a-different-location) which explains how to do this. *The short answer* is use the `OLLAMA_MODELS` environment variable if you want to put...

You shouldn't need to delete any of the files manually. If you stop the ollama service and restart it it should clean up any dangling files. You can also change...

#2146 adds this which will be available in `0.1.23`. Going to go ahead and close this. You can set `keep_alive` to `-1` when calling the chat API and it will...

@nathanleclaire I've been thinking about adding an `OLLAMA_KEEP_ALIVE` env variable to be able to change the default timeout. I don't want to go too extreme here though because ideally there...