Patrick Devine

Results 426 comments of Patrick Devine

@OguzcanOzdemir what's the use case for that?

@abdurahmanadilovic what's the error that you're seeing? Does it just hang? Also, what's the output of `ollama -v`?

Going to go ahead and close this. @qianjun1985 feel free to respond if it's not working and we can reopen the issue.

The Mistral finetunes should work now. It ended up being a problem with the vocab being the wrong size (in the case of dolphin-mistral it was missing the 32,001th token)....

I just tried this with a 2xA100 on Ubuntu 22.04 and everything is working correctly: ``` $ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3:instruct a6990ed6be41 5.4 GB 100% GPU...

@rk-spirinova can you update to `0.1.38` and try again? Also, what version of Linux are you running?

Going to close this as a dupe of #1881 . Please try `0.1.22` and make sure you have the latest version of the model you're trying to run (you can...

OK, I've tested this out on 2x3060s and I believe everything is working. This is with the `llama3:8b-instruct-fp16` model which splits the model across both cards ``` $ ollama run...

I'm going to go ahead and close the issue, but you guys should feel free to keep commenting.

I've updated that FAQ to cover both situations ([pre-loading models](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-pre-load-a-model-to-get-faster-response-times) as well as [controlling how long models are loaded into memory](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately). I think people were missing this in the API...