continue icon indicating copy to clipboard operation
continue copied to clipboard

Add keep alive to the embedding model config

Open johnnyasantoss opened this issue 1 year ago • 6 comments

Validations

  • [X] I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • [X] I'm not able to find an open issue that requests the same enhancement

Problem

The other model configurations allow to set the keep alive (that's really useful with Ollama) and so it would be nice to have that on the embedding model as well.

Solution

Add the keepAlive config under the embed model config

johnnyasantoss avatar Aug 16 '24 01:08 johnnyasantoss

I don't know at all if it's relevant for you as I don't know your use case. But for anyone in my case: I wanted to prevent the embedding model to unload my Big LLM every time (takes my whole gpu 95%) So, here is a simple workaround:

  • set the OLLAMA_KEEP_ALIVE env var to -1 //GLOBAL SETTING FOR ALL MODELS
  • set num_gpu to 0 in the Ollama modelfile for the embedding model

That way:

  • the embedding model stays always loaded in RAM (which isn't really impactful as they are generally very light)
  • the Big model stays loaded in VRAM :)

https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately

blakkd avatar Aug 20 '24 21:08 blakkd

Ok, deleting my comment as it's not relevant as I see ollama doesn't actually load the embedding models into the gpu! Even if you put num_gpu 10 or so. So my comment was confusing!

blakkd avatar Aug 21 '24 22:08 blakkd

@blakkd thank you anyway. I didn't know that. My usecase is that I want to free the ram back on my m3 mac while the model isn't running. Even thought the models aren't that big, still would be nice to have the mem for other applications without having to close vscode/whatever is using continue

johnnyasantoss avatar Aug 22 '24 16:08 johnnyasantoss

Oh I see. But then why you want to set the keep_alive? It's meant do exactly the opposite :thinking: However, maybe setting a duration value < 5min (from last inference) instead could help you. Default is 5min.

blakkd avatar Aug 22 '24 18:08 blakkd

Yes, that's what I want. Set it to 60s, which is reasonable time between file saves while editing and enough to know that I'm no longer coding.

johnnyasantoss avatar Aug 22 '24 18:08 johnnyasantoss

(Just to inform it seems the fact ollama wasn't able to loade the embedding models on the GPU was a bug. I don't face it anymore on 0.3.8. I think I was on 0.3.6 before.)

blakkd avatar Aug 29 '24 19:08 blakkd

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

github-actions[bot] avatar Mar 03 '25 04:03 github-actions[bot]

This still relevant

johnnyasantoss avatar Mar 13 '25 17:03 johnnyasantoss

Hey @johnnyasantoss could you show how are you using keep alive?

models:
  - name: Quen3 14b
    provider: ollama
    model:  quen3:14b
    apiBase: ....
    defaultCompletionOptions:
      keepAlive: 600  # (10 minutes)

This is what I am using and doing ollama ps I still get 30 minutes.

Thanks!

pfcouto avatar May 28 '25 17:05 pfcouto

could you show how are you using keep alive?

yes, it's the completionOptions setting in the ~/.continue/config.json.


  "completionOptions": {
    "keepAlive": 120
  }

johnnyasantoss avatar Jul 10 '25 18:07 johnnyasantoss

Hey @johnnyasantoss do you by any change kown, what the filename would be for windows?

Thanks!

pfcouto avatar Jul 17 '25 16:07 pfcouto

Hey @johnnyasantoss do you by any change kown, what the filename would be for windows?

Idk, but it's probably in %APPDATA%

johnnyasantoss avatar Jul 22 '25 20:07 johnnyasantoss

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

github-actions[bot] avatar Oct 21 '25 02:10 github-actions[bot]

This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!

github-actions[bot] avatar Oct 31 '25 02:10 github-actions[bot]