Copilot with Local Models via Ollama
Description
It appears marimo supports both GitHub Copilot and Codeium Copilot; I am hoping to use locally hosted models on Ollama as the third option.
Suggested solution
the completion section in ~/.marimo.toml would look like:
[completion]
copilot = "ollama"
api_key = "ollama"
model = "codeqwen:7b-code-v1.5-q5_1"
base_url = "http://localhost:11434/v1/chat/completions"
Alternative
No response
Additional context
No response
We do support ollama for prompting to refactor/create cells. We can look into adding support for the code-completion as well.
I think we can just update copilot and have the rest of the settings like model/base-url inherit from [ai.open_ai] config,
+ copilot: Union[bool, Literal["github", "codeium", "ollama"]]
- copilot: Union[bool, Literal["github", "codeium"]]
@zhibor Couple questions with the approach:
- Would you normally want a different
modelfor code completion as you would for the AI to create/modify cells? - Same as above, but would the
base_urlbe differenent?
@mscolnick: I greatly appreciate your prompt response!
- Yes, the user may choose a different model to fit their hardware and/or their needs.
- The
base_urlcould be different based on how it's deployed.
@zhibor - could you explain a bit more? those reasons seem like they would apply for both code completion and the chat. can you even run more than one ollama server at a time?
- Users can select from models and choose one that best suits their system resources, requirements, and preferences.
- Deployment variations (local, on-prem, cloud) may require adjusting the
base_urlto reflect the specific hosting method.
Most configuration applied to both code completion and chat; Yes, the implementation allows for running more than one Ollama server in parallel if required by the scenario or usage needs.
Got it, thanks for the exploration @zhibor.
We can probably start off with sharing the config between chat and completion in a smaller PR - and later add diverging config if it turns out to be a common case.
Thanks for sharing your great work! Let me know how I may help and contribute.
If you'd like to give this a shot, most of the logic should live in this file: https://github.com/marimo-team/marimo/blob/main/frontend/src/core/codemirror/copilot/extension.ts#L27
And if you do try it out, we can just do the minimal config change, without adding another base_url for now:
+ copilot: Union[bool, Literal["github", "codeium", "ollama"]]
- copilot: Union[bool, Literal["github", "codeium"]]
If not, I may be able to get to this in a week or so.
That makes sense. My TypeScript skills are a bit rusty since I've been using Python mostly these days, so I'll let you handle it when you get the chance. There's definitely no rush. Thanks very much!
Got it, thanks for the exploration @zhibor.
We can probably start off with sharing the config between chat and completion in a smaller PR - and later add diverging config if it turns out to be a common case.
I would recommend starting with separate baseurls.
The models that are used for completion are typically the pure 'Coder' models, where as the chat models are typically 'Coder-Instruct' models.
For example I'm currently using "Qwen2.5-Coder-32B-Instruct" hosted by the 'llama.cpp' llama-server on one port for 'chat' (This is working nicely in Marimo)
And for 'completion' I'm having to drop out to my vim editor (with ggml-org/llama.vim) to use "Qwen2.5-Coder-7B" on a different port. (It would be nice to be able to stay in Marimo when I need completion)
Using an 'Instruct' model for completion does still work, but its less useful/reliable.
This is a great callout @psymonryan, thanks for walking through that.
+1
I have a PR for this feature here: https://github.com/marimo-team/marimo/pull/4136
If anyone would like to check out the branch and run it locally, and provide any feedback, it would be greatly appreciated.