sunhy0316

Results 5 comments of sunhy0316

Encountered the same issue, where only one model can be loaded. When loading a second model, should it first check the **actual remaining available VRAM**, rather than using Ollama's predicted...

Write it in modelfile is a good choice.

A higher CUDA would solve this. https://github.com/NVIDIA/cutlass/issues/1126 CUDA 12.4 works now, after updating torch.

I've encountered the same issue. Hopefully, the parameters that cause a reload can be removed from the API parameters and instead be set uniformly through a modelfile or environment variables....