CustomSuggestionServiceForCopilotForXcode icon indicating copy to clipboard operation
CustomSuggestionServiceForCopilotForXcode copied to clipboard

Using Qwen2.5-Coder with LM Studio

Open c0008 opened this issue 1 year ago • 6 comments

I am trying to setup code completion with the Qwen2.5-Coder models. I have tested different model versions and different settings for "Request Strategy" but it is not working well. Either the configuration is not compatible with LM Studio or otherwise the model responds with `` or <code> most times. So the model tries to use format tags.

The best configuration for Qwen would be the "Naive" setting as the models are trained for this situation. source: https://github.com/QwenLM/Qwen2.5-Coder?tab=readme-ov-file#1-basic-usage However it is not working because LM Studio doesn't accept an empty system prompt like this:

    {
      "role": "system",
      "content": ""
    },

Maybe there could be a new option added to set a custom system prompt or to remove this empty message.

The Qwen2.5-Coder models are also trained for fill in the middle. This is the prompt format:

prompt = '<|fim_prefix|>' + prefix_code + '<|fim_suffix|>' + suffix_code + '<|fim_middle|>'

source: https://github.com/QwenLM/Qwen2.5-Coder?tab=readme-ov-file#3-file-level-code-completion-fill-in-the-middle I did put the following in the 'FIM Template' text field but I am still getting responses like ``.

<|fim_prefix|> {prefix} <|fim_suffix|> {suffix} <|fim_middle|>

Maybe there is a conflict with the given system prompt which instructs to use other strings. The fill in the middle setting without system prompt is not working for the same reason as the "Naive" setting.

c0008 avatar Oct 02 '24 22:10 c0008

I am no expert to LLM so I honestly don't know how the prompt template actually works. My guess is when you use the API from LM Studio, it will fill your input into the built-in prompt template so the input from the FIM strategy will be embedded into another template. You can try changing the template of the model in LM Studio to see if there is any difference.

If you find that any change is needed from this project, let me know. And pull requests are always welcome.

intitni avatar Oct 03 '24 06:10 intitni

I got it working now by using the setting "Completion API" instead of "Chat Completion API". It works because this setting does not send an empty system prompt. It would still be nice if one could set a custom system prompt instead of an empty one.

You don't have to worry about the chat template (like ChatML for Qwen). By default this is handled by the server but for completion tasks the chat template is not even required.

c0008 avatar Oct 07 '24 06:10 c0008

As mentioned in the README, it's not recommended to use the chat completions API because the LLM will tend to respond as if it's chatting with you.

If you are using the FIM strategy, you are basically sending the raw prompt that contains the prompt template (for example <|fim_prefix|> is a template token). The behavior will be different between LLM though.

intitni avatar Oct 07 '24 06:10 intitni

@intitni @c0008

#19

It seems that the stream was not properly detecting completion state in chat responses. I've changed the done condition to check for finish_reason presence instead of relying on content parsing. This fixes premature stream termination and ensures all response chunks are properly processed.

gbfansheng avatar Nov 18 '24 02:11 gbfansheng

After giving the Qwen 2.5 Coder coder instruct 7B a try in LM Studio, I noticed that the default prompt template is just incorrect because they don't have a preset for Qwen. As long as you select a preset like ChatML, The response will be in the correct format (But you should always use the official template).

I tried the changes from #19 in Copilot for Xcode and it can still generate <tags> sometimes when the template is incorrect.

intitni avatar Nov 28 '24 08:11 intitni

I noticed that the default prompt template is just incorrect because they don't have a preset for Qwen.

A correct prompt template should come with the gguf or mlx file but of course it depends on who created this file. In doubt the easiest thing to do is to download the same model from another source.

c0008 avatar Nov 28 '24 17:11 c0008