sunhy0316
sunhy0316
+1 for Visual Studio 2022
Encountered the same issue, where only one model can be loaded. When loading a second model, should it first check the **actual remaining available VRAM**, rather than using Ollama's predicted...
Write it in modelfile is a good choice.
A higher CUDA would solve this. https://github.com/NVIDIA/cutlass/issues/1126 CUDA 12.4 works now, after updating torch.
I've encountered the same issue. Hopefully, the parameters that cause a reload can be removed from the API parameters and instead be set uniformly through a modelfile or environment variables....