CustomSuggestionServiceForCopilotForXcode Support New Fill In The Middle API for Ollama

The Ollama project recently merged this PR, which adds support for fill-in-the-middle completions via the existing generate endpoint. Would love to see this supported in Custom Suggestion Service as well.

Aug 29 '24 19:08 rwebb-fundrise

Do you know which model supports the new API? I have tried several models but all of them complain that model does not support insert.

Sep 13 '24 08:09 intitni

This would be really useful. I know that the Continue plugin for VS Code works perfectly with Ollama/codellama/starcoder2. Maybe that would be a starting point?

Sep 17 '24 12:09 jmitek

@jmitek This model does support the new API, but the result is super weird and worse than using the Fill-in-the-middle strategy and fill in the template manually. Are you using 3b or the 7b model?

(Note, you can already use FIM supported models as a completion API as long as you know the template)

Sep 17 '24 12:09 intitni

@intitni I'm mostly using codellama:13b. Also starcode2:15b. I've no idea what the template might look like for them though.

Sep 17 '24 13:09 jmitek

@jmitek The default one is for codellama. The starcoder one looks like

<fim-prefix>def fib(n):<fim-suffix>    else:\n        return fib(n - 2) + fib(n - 1)<fim-middle>

Sep 17 '24 13:09 intitni

Interesting. So if connect it to codellama , here are my settings:

This is the kind of completion I get in Xcode:

Using "Default" has similar results. I notice that it is using the /chat api, though I would have expected /generate api instead?

But if I choose "Continue" one it looks a bit more reasonable, though it seems to duplicate the preceding lines of code. Haven't tried starcoder2 yet

Sep 17 '24 13:09 jmitek

@jmitek Please change the model to completion API and set it up again. The imported models will be treated as chat completions API. You may also need the -code, too.

Sep 17 '24 13:09 intitni

@intitni Thanks, so I made it use /generate again and filled in the template exactly as you have it (the default one I had was different). I can see it is using /generate now, suggestions seems okay, except still have the duplicated code. See this example: Before:

After:

Sep 17 '24 13:09 jmitek

@jmitek it works fine on my Mac though. What's your settings again?

Sep 17 '24 13:09 intitni

filled in the template exactly as you have it (the default one I had was different)

Oh the default one is actually correct, I was testing starcoder2 when I made the screenshot.

Sep 17 '24 13:09 intitni

here my updated settings:

I see that you are using codellama:7b-code, I'm using the non "-code" version, if that makes any difference?

Sep 17 '24 13:09 jmitek

@jmitek You need to reset the template to the default one

Sep 17 '24 13:09 intitni

I don't know much about the code suffix, I found it in the documentation https://ollama.com/library/codellama in the Fill in the middle section

Sep 17 '24 13:09 intitni

Awesome!, I pulled codellama:13b-code and use this from the Ollama site (https://ollama.com/library/codellama:13b-code): <PRE> {prefix} <SUF>{suffix} <MID> which is the same as the default one in the app. So it seems to break with the non "-code" version - although Continue plugin is somehow able to use the non "-code" version... Anyway it works perfectly :)

Sep 17 '24 14:09 jmitek

Maybe they are using their own prompt strategy. Can you see what the prompt they are sending to Ollama?

Sep 17 '24 14:09 intitni

I have tried the model starcode2 with ollama. The completion result is different with continue. I found the request of continue have a parameter 'raw' which is set to true. I change the source code , add 'raw' parameter in the request, then the completion is the same with continue.

Sep 21 '24 03:09 sonofsky2010

@sonofsky2010 Hi, I have tried the raw parameter but I am still getting weird output from starcoder2:7b (there are template tags in the output). Do you have a complete request body the Continue plugin sent?

Sep 24 '24 07:09 intitni

@intitni I check the request of continue. I think it maybe caused by the wrong 'stop' parameters. Now it send 'stop' with empty list. It maybe override the stop parameters which ollama read from the model parameter.

Sep 30 '24 17:09 sonofsky2010