marimo icon indicating copy to clipboard operation
marimo copied to clipboard

Copilot with Local Models via Ollama

Open z-ai-lab opened this issue 1 year ago • 11 comments

Description

It appears marimo supports both GitHub Copilot and Codeium Copilot; I am hoping to use locally hosted models on Ollama as the third option.

Suggested solution

the completion section in ~/.marimo.toml would look like:

[completion]
copilot = "ollama"
api_key = "ollama"
model = "codeqwen:7b-code-v1.5-q5_1"
base_url = "http://localhost:11434/v1/chat/completions"

Alternative

No response

Additional context

No response

z-ai-lab avatar Sep 02 '24 01:09 z-ai-lab

We do support ollama for prompting to refactor/create cells. We can look into adding support for the code-completion as well.

I think we can just update copilot and have the rest of the settings like model/base-url inherit from [ai.open_ai] config,

+ copilot: Union[bool, Literal["github", "codeium", "ollama"]]
- copilot: Union[bool, Literal["github", "codeium"]]

@zhibor Couple questions with the approach:

  1. Would you normally want a different model for code completion as you would for the AI to create/modify cells?
  2. Same as above, but would the base_url be differenent?

mscolnick avatar Sep 02 '24 18:09 mscolnick

@mscolnick: I greatly appreciate your prompt response!

  1. Yes, the user may choose a different model to fit their hardware and/or their needs.
  2. The base_url could be different based on how it's deployed.

z-ai-lab avatar Sep 02 '24 18:09 z-ai-lab

@zhibor - could you explain a bit more? those reasons seem like they would apply for both code completion and the chat. can you even run more than one ollama server at a time?

mscolnick avatar Sep 02 '24 20:09 mscolnick

  1. Users can select from models and choose one that best suits their system resources, requirements, and preferences.
  2. Deployment variations (local, on-prem, cloud) may require adjusting the base_url to reflect the specific hosting method.

Most configuration applied to both code completion and chat; Yes, the implementation allows for running more than one Ollama server in parallel if required by the scenario or usage needs.

z-ai-lab avatar Sep 02 '24 22:09 z-ai-lab

Got it, thanks for the exploration @zhibor.

We can probably start off with sharing the config between chat and completion in a smaller PR - and later add diverging config if it turns out to be a common case.

mscolnick avatar Sep 02 '24 23:09 mscolnick

Thanks for sharing your great work! Let me know how I may help and contribute.

z-ai-lab avatar Sep 03 '24 01:09 z-ai-lab

If you'd like to give this a shot, most of the logic should live in this file: https://github.com/marimo-team/marimo/blob/main/frontend/src/core/codemirror/copilot/extension.ts#L27

And if you do try it out, we can just do the minimal config change, without adding another base_url for now:

+ copilot: Union[bool, Literal["github", "codeium", "ollama"]]
- copilot: Union[bool, Literal["github", "codeium"]]

If not, I may be able to get to this in a week or so.

mscolnick avatar Sep 03 '24 02:09 mscolnick

That makes sense. My TypeScript skills are a bit rusty since I've been using Python mostly these days, so I'll let you handle it when you get the chance. There's definitely no rush. Thanks very much!

z-ai-lab avatar Sep 03 '24 10:09 z-ai-lab

Got it, thanks for the exploration @zhibor.

We can probably start off with sharing the config between chat and completion in a smaller PR - and later add diverging config if it turns out to be a common case.

I would recommend starting with separate baseurls.

The models that are used for completion are typically the pure 'Coder' models, where as the chat models are typically 'Coder-Instruct' models.

For example I'm currently using "Qwen2.5-Coder-32B-Instruct" hosted by the 'llama.cpp' llama-server on one port for 'chat' (This is working nicely in Marimo)

And for 'completion' I'm having to drop out to my vim editor (with ggml-org/llama.vim) to use "Qwen2.5-Coder-7B" on a different port. (It would be nice to be able to stay in Marimo when I need completion)

Using an 'Instruct' model for completion does still work, but its less useful/reliable.

psymonryan avatar Nov 14 '24 00:11 psymonryan

This is a great callout @psymonryan, thanks for walking through that.

mscolnick avatar Nov 14 '24 00:11 mscolnick

+1

s-celles avatar Nov 22 '24 19:11 s-celles

I have a PR for this feature here: https://github.com/marimo-team/marimo/pull/4136

If anyone would like to check out the branch and run it locally, and provide any feedback, it would be greatly appreciated.

mscolnick avatar Mar 17 '25 20:03 mscolnick