sunhy0316 comments

Repositories
Issues
Comments

Results 5 comments of


                                            sunhy0316

Visual Studio (not Code) Continue extension?

+1 for Visual Studio 2022

Memory allocation or estimation problem

Encountered the same issue, where only one model can be loaded. When loading a second model, should it first check the **actual remaining available VRAM**, rather than using Ollama's predicted...

Memory allocation or estimation problem

Write it in modelfile is a good choice.

Installation failed with flash attnetion>1

A higher CUDA would solve this. https://github.com/NVIDIA/cutlass/issues/1126 CUDA 12.4 works now, after updating torch.

Ollama reload the same model while switching the clients

I've encountered the same issue. Hopefully, the parameters that cause a reload can be removed from the API parameters and instead be set uniformly through a modelfile or environment variables....