Patrick Devine comments

Results 426 comments of


                                            Patrick Devine

Is it possible to download the models from browser?

@OguzcanOzdemir what's the use case for that?

Is it possible to download the models from browser?

@abdurahmanadilovic what's the error that you're seeing? Does it just hang? Also, what's the output of `ollama -v`?

Windows CUDA OOM GTX 1650 switching models between mistral and gemma

Going to go ahead and close this. @qianjun1985 feel free to respond if it's not working and we can reopen the issue.

Convert Safetensors to an Ollama model

The Mistral finetunes should work now. It ended up being a problem with the vocab being the wrong size (in the case of dolphin-mistral it was missing the 32,001th token)....

llama 70b takes 5.5 min to load on A100

I just tried this with a 2xA100 on Ubuntu 22.04 and everything is working correctly: ``` $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3:instruct a6990ed6be41 5.4 GB 100% GPU...

llama 70b takes 5.5 min to load on A100

@rk-spirinova can you update to `0.1.38` and try again? Also, what version of Linux are you running?

bad generation on multi-GPU setup

Going to close this as a dupe of #1881 . Please try `0.1.22` and make sure you have the latest version of the model you're trying to run (you can...

bad generation on multi-GPU setup

OK, I've tested this out on 2x3060s and I believe everything is working. This is with the `llama3:8b-instruct-fp16` model which splits the model across both cards ``` $ ollama run...

how to write script so that it will remember the last conversation .

I'm going to go ahead and close the issue, but you guys should feel free to keep commenting.

Ability to preload a model?

I've updated that FAQ to cover both situations ([pre-loading models](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-pre-load-a-model-to-get-faster-response-times) as well as [controlling how long models are loaded into memory](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately). I think people were missing this in the API...