marimo Copilot with Local Models via Ollama

Description

It appears marimo supports both GitHub Copilot and Codeium Copilot; I am hoping to use locally hosted models on Ollama as the third option.

Alternative

No response

Additional context

No response

Sep 02 '24 01:09 z-ai-lab

We do support ollama for prompting to refactor/create cells. We can look into adding support for the code-completion as well.

I think we can just update copilot and have the rest of the settings like model/base-url inherit from [ai.open_ai] config,

+ copilot: Union[bool, Literal["github", "codeium", "ollama"]]
- copilot: Union[bool, Literal["github", "codeium"]]

@zhibor Couple questions with the approach:

Would you normally want a different model for code completion as you would for the AI to create/modify cells?
Same as above, but would the base_url be differenent?

Sep 02 '24 18:09 mscolnick

@mscolnick: I greatly appreciate your prompt response!

Yes, the user may choose a different model to fit their hardware and/or their needs.
The base_url could be different based on how it's deployed.

Sep 02 '24 18:09 z-ai-lab

@zhibor - could you explain a bit more? those reasons seem like they would apply for both code completion and the chat. can you even run more than one ollama server at a time?

Sep 02 '24 20:09 mscolnick

Users can select from models and choose one that best suits their system resources, requirements, and preferences.
Deployment variations (local, on-prem, cloud) may require adjusting the base_url to reflect the specific hosting method.

Most configuration applied to both code completion and chat; Yes, the implementation allows for running more than one Ollama server in parallel if required by the scenario or usage needs.

Sep 02 '24 22:09 z-ai-lab

Got it, thanks for the exploration @zhibor.

We can probably start off with sharing the config between chat and completion in a smaller PR - and later add diverging config if it turns out to be a common case.

Sep 02 '24 23:09 mscolnick

Thanks for sharing your great work! Let me know how I may help and contribute.

Sep 03 '24 01:09 z-ai-lab

If you'd like to give this a shot, most of the logic should live in this file: https://github.com/marimo-team/marimo/blob/main/frontend/src/core/codemirror/copilot/extension.ts#L27

And if you do try it out, we can just do the minimal config change, without adding another base_url for now:

+ copilot: Union[bool, Literal["github", "codeium", "ollama"]]
- copilot: Union[bool, Literal["github", "codeium"]]

If not, I may be able to get to this in a week or so.

Sep 03 '24 02:09 mscolnick

That makes sense. My TypeScript skills are a bit rusty since I've been using Python mostly these days, so I'll let you handle it when you get the chance. There's definitely no rush. Thanks very much!

Sep 03 '24 10:09 z-ai-lab

Got it, thanks for the exploration @zhibor.

We can probably start off with sharing the config between chat and completion in a smaller PR - and later add diverging config if it turns out to be a common case.

I would recommend starting with separate baseurls.

The models that are used for completion are typically the pure 'Coder' models, where as the chat models are typically 'Coder-Instruct' models.

For example I'm currently using "Qwen2.5-Coder-32B-Instruct" hosted by the 'llama.cpp' llama-server on one port for 'chat' (This is working nicely in Marimo)

And for 'completion' I'm having to drop out to my vim editor (with ggml-org/llama.vim) to use "Qwen2.5-Coder-7B" on a different port. (It would be nice to be able to stay in Marimo when I need completion)

Using an 'Instruct' model for completion does still work, but its less useful/reliable.

Nov 14 '24 00:11 psymonryan

This is a great callout @psymonryan, thanks for walking through that.

Nov 14 '24 00:11 mscolnick

+1

Nov 22 '24 19:11 s-celles

I have a PR for this feature here: https://github.com/marimo-team/marimo/pull/4136

If anyone would like to check out the branch and run it locally, and provide any feedback, it would be greatly appreciated.

Mar 17 '25 20:03 mscolnick

Copilot with Local Models via Ollama

Description

Suggested solution

Alternative

Additional context