Add keep alive to the embedding model config
Validations
- [X] I believe this is a way to improve. I'll try to join the Continue Discord for questions
- [X] I'm not able to find an open issue that requests the same enhancement
Problem
The other model configurations allow to set the keep alive (that's really useful with Ollama) and so it would be nice to have that on the embedding model as well.
Solution
Add the keepAlive config under the embed model config
I don't know at all if it's relevant for you as I don't know your use case. But for anyone in my case: I wanted to prevent the embedding model to unload my Big LLM every time (takes my whole gpu 95%) So, here is a simple workaround:
- set the OLLAMA_KEEP_ALIVE env var to -1 //GLOBAL SETTING FOR ALL MODELS
- set num_gpu to 0 in the Ollama modelfile for the embedding model
That way:
- the embedding model stays always loaded in RAM (which isn't really impactful as they are generally very light)
- the Big model stays loaded in VRAM :)
https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
Ok, deleting my comment as it's not relevant as I see ollama doesn't actually load the embedding models into the gpu! Even if you put num_gpu 10 or so. So my comment was confusing!
@blakkd thank you anyway. I didn't know that. My usecase is that I want to free the ram back on my m3 mac while the model isn't running. Even thought the models aren't that big, still would be nice to have the mem for other applications without having to close vscode/whatever is using continue
Oh I see. But then why you want to set the keep_alive? It's meant do exactly the opposite :thinking: However, maybe setting a duration value < 5min (from last inference) instead could help you. Default is 5min.
Yes, that's what I want. Set it to 60s, which is reasonable time between file saves while editing and enough to know that I'm no longer coding.
(Just to inform it seems the fact ollama wasn't able to loade the embedding models on the GPU was a bug. I don't face it anymore on 0.3.8. I think I was on 0.3.6 before.)
This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.
This still relevant
Hey @johnnyasantoss could you show how are you using keep alive?
models:
- name: Quen3 14b
provider: ollama
model: quen3:14b
apiBase: ....
defaultCompletionOptions:
keepAlive: 600 # (10 minutes)
This is what I am using and doing ollama ps I still get 30 minutes.
Thanks!
could you show how are you using keep alive?
yes, it's the completionOptions setting in the ~/.continue/config.json.
"completionOptions": {
"keepAlive": 120
}
Hey @johnnyasantoss do you by any change kown, what the filename would be for windows?
Thanks!
Hey @johnnyasantoss do you by any change kown, what the filename would be for windows?
Idk, but it's probably in %APPDATA%
This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.
This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!