Patrick Devine
Patrick Devine
@Picaso2 other than the multimodal models we don't _yet_ support loading multiple models into memory simultaneously. What is the use case you're trying to do?
Sorry for the slow response. This did get fixed a while back but the issue never got updated. Here's an example: ``` % ./ollama run llava:13b "Describe this image: /Users/pdevine/Pictures/steve.png"...
Hi @byteconcepts , sorry for the slow response. I just pulled and was successful: ``` % ./ollama pull dolphin-mixtral:8x7b-v2.5-q3_K_L pulling manifest pulling a69e225da78e... 100% ▕█████████████████████████████████████████████████████████████████▏ 20 GB pulling 43070e2d4e53... 100%...
I think you were just running out of memory when you were trying to run the model. We've made several changes around how we handle memory since you filed this,...
We actually changed the docs on this a while back to not use the docker image for quantizing. You can see it [here](https://github.com/ollama/ollama/blob/main/docs/import.md#quantize-the-model). I have been working on a new...
Hey @adriens is the animation getting animated incorrectly? I'm just wondering what the use case is.
@adriens Have you tried out the new [python bindings](https://github.com/ollama/ollama-python)?
@adriens are you OK to close the issue? I'm not sure we need this, but can always reopen in the future.
I've submitted #2179
This should be working better in that ollama should offload a portion to the GPU, and a portion to the CPU. Can you test again with ollama version 0.1.28? There...