Support New Fill In The Middle API for Ollama
The Ollama project recently merged this PR, which adds support for fill-in-the-middle completions via the existing generate endpoint. Would love to see this supported in Custom Suggestion Service as well.
Do you know which model supports the new API? I have tried several models but all of them complain that model does not support insert.
This would be really useful. I know that the Continue plugin for VS Code works perfectly with Ollama/codellama/starcoder2. Maybe that would be a starting point?
@jmitek This model does support the new API, but the result is super weird and worse than using the Fill-in-the-middle strategy and fill in the template manually. Are you using 3b or the 7b model?
(Note, you can already use FIM supported models as a completion API as long as you know the template)
@intitni I'm mostly using codellama:13b. Also starcode2:15b. I've no idea what the template might look like for them though.
@jmitek The default one is for codellama. The starcoder one looks like
<fim-prefix>def fib(n):<fim-suffix> else:\n return fib(n - 2) + fib(n - 1)<fim-middle>
Interesting. So if connect it to codellama , here are my settings:
This is the kind of completion I get in Xcode:
Using "Default" has similar results. I notice that it is using the /chat api, though I would have expected /generate api instead?
But if I choose "Continue" one it looks a bit more reasonable, though it seems to duplicate the preceding lines of code. Haven't tried starcoder2 yet
@jmitek Please change the model to completion API and set it up again. The imported models will be treated as chat completions API. You may also need the -code, too.
@intitni Thanks, so I made it use /generate again and filled in the template exactly as you have it (the default one I had was different). I can see it is using /generate now, suggestions seems okay, except still have the duplicated code. See this example:
Before:
After:
@jmitek it works fine on my Mac though. What's your settings again?
filled in the template exactly as you have it (the default one I had was different)
Oh the default one is actually correct, I was testing starcoder2 when I made the screenshot.
here my updated settings:
I see that you are using codellama:7b-code, I'm using the non "-code" version, if that makes any difference?
@jmitek You need to reset the template to the default one
I don't know much about the code suffix, I found it in the documentation https://ollama.com/library/codellama in the Fill in the middle section
Awesome!, I pulled codellama:13b-code and use this from the Ollama site (https://ollama.com/library/codellama:13b-code):
<PRE> {prefix} <SUF>{suffix} <MID>
which is the same as the default one in the app.
So it seems to break with the non "-code" version - although Continue plugin is somehow able to use the non "-code" version...
Anyway it works perfectly :)
Maybe they are using their own prompt strategy. Can you see what the prompt they are sending to Ollama?
I have tried the model starcode2 with ollama. The completion result is different with continue. I found the request of continue have a parameter 'raw' which is set to true. I change the source code , add 'raw' parameter in the request, then the completion is the same with continue.
@sonofsky2010 Hi, I have tried the raw parameter but I am still getting weird output from starcoder2:7b (there are template tags in the output). Do you have a complete request body the Continue plugin sent?
@intitni I check the request of continue. I think it maybe caused by the wrong 'stop' parameters. Now it send 'stop' with empty list. It maybe override the stop parameters which ollama read from the model parameter.