CodeGPT Add more configs for autocompletion

Describe the need of your request

Inline code completion is very useful feature, but it would be awesome to make it more customisable.

For example, now it always generates a lot of lines of completion. Now I can limit max tokens completed, but that way partial lines produced, so a lot of modification of such completed lines required.

Proposed solution

For me personally following customisation would be useful:

Custom stop sequence:
- for example I would use sequence "\n" to allow autocomplete generate only single line per completion
- or "\n\n" would allow autocomplete to generate till the end of this block (i am on python)
- for other languages it may be "}" to allow autocomplete only till the end of block.

I do not know about other LLMs, but OpenAI has this "stop_sequence" parameter in their API 2. Hot key for autocompletion: It would be cool to have an option when auto-completion usually turned-off, but triggered only when I explicitly need it (by hot-key). It would save me LLM tokens and do not spam in my editor when I do not need it. For now turning on and off requires several mouse action which "is not as hot" as keyboard combination would be.

Additional context

CodeGPT is awesome!

Mar 03 '24 12:03 dimitree54

+1 for hotkey support, one for trigger completion, one for select line by line.

Mar 04 '24 03:03 wsxqyws

Thank you for reporting!

The feature wasn't really meant for public use at the moment, hence the lack of documentation. It still lacks pre- and post-processing, and I'm not sure how to deal with this problem yet.

I would consider the option to configure when the completion should stop a temporary solution rather than the correct solution. Constantly hitting TAB and memorizing the correct hotkeys for certain configurations is simply too much to handle while maintaining a consistent workflow. For now, it could be useful, yes.

Recently, I have been playing around with tree-sitter and how to use it to create a syntax tree for a repository that can later be used to reference certain parts of your code for the original prompt. The same kind of parsing can also be applied to code completion. For example, we can track the user's cursor position within the syntax tree or what the user is attempting to write, such as function arguments, import declarations, comments, etc. Thus, we can distinguish whether the user is requesting multi-line or single-line completion. The only downside is that the implementation is language-specific.

Mar 04 '24 09:03 carlrobertoh

Hi, @carlrobertoh, thank you for your awesome project.

Your idea of smartly choosing of how much to auto-complete sounds cool and ambitious. I believe it would be useful for a lot of users. But from my personal experience, I rarely happy with full-function completions of GPT-4 and almost always prefer single line ones. So the possibility to customise behaviour for each user preferences still looks very useful for me. Actually, I highly prefer CodeGPT over other AI tools (GitHub Copilot, JetBrains AI Assistant) exactly because of such a fine control over code-generation, rather than deciding everything for user.

What about hot-keys, I personally prefer to keep auto-completion turned off most of the time to avoid spamming in my editor. Regardless of how good auto-completion is, some code is better to be written by myself and without distractions, in my opinion. So the hot-key for easy switching it on and off would be a real time-saver for me. I would love to memorise the correct hot-key for such a useful tool.

Mar 09 '24 08:03 dimitree54

@dimitree54 Can you confirm that the hotkeys are working in CodeGPT 2.5.1?

Open Settings and filter for completions
There should be 2 results under Plugins / CodeGPT: Disable Completions and Enable Completions

Apr 06 '24 10:04 reneleonhardt

@reneleonhardt Yes, I have already tried it, it works. Thank you. That way is more comfortable for me to use completions rather than always on. But the downside of this switchable approach I realised is that you need to wait around 5 seconds before completion generated.

Apr 06 '24 12:04 dimitree54

To add to this, I'm not sure how difficult it would be, but adding an option to use a different (smaller) local model for completions, keeping the other (larger) one for things like chat.

May 01 '24 20:05 seanrclayton

To add to this, I'm not sure how difficult it would be, but adding an option to use a different (smaller) local model for completions, keeping the other (larger) one for things like chat.

That's the best idea because some models are exclusively for code completion (very fast for 1 line answers) and others only for chatting/instruction following (-it suffix): https://huggingface.co/google/codegemma-7b-it#description

May 06 '24 06:05 reneleonhardt

@reneleonhardt Yes, I have already tried it, it works. Thank you. That way is more comfortable for me to use completions rather than always on. But the downside of this switchable approach I realised is that you need to wait around 5 seconds before completion generated.

I would propose to utilize IntelliJ's built-in lookup suggestions feature, where you get suggestions from all available Inline-Completion-Providers. Currently CodeGPTs InlineCompletionProvider is ignoring when you press CTRL+SPACE, which opens up IntelliJ's suggestions.

I tested this by telling CodeGPT's InlineCompletionProvider to also trigger on a lookup, while the code-completions are disabled: code-completion-shortcut

Thereby pressing CTRL+SPACE is triggering the code-completion manually.

To add to this, I'm not sure how difficult it would be, but adding an option to use a different (smaller) local model for completions, keeping the other (larger) one for things like chat.

Do you mean to e.g. use a Llama.Cpp Server locally with CodeLlama for code-completions while using e.g. ChatGPT for the Chat? Or do you want to use the local Llama Server with 2 models at the same time? That would mean loading/off-loading models all the time, since Llama doesnt support parallel models. I think Ollama supports it by now, but I have no idea how the memory requirements are.

May 07 '24 14:05 Crustack

That's the best idea because some models are exclusively for code completion (very fast for 1 line answers) and others only for chatting/instruction following (-it suffix): https://huggingface.co/google/codegemma-7b-it#description

As you can see in the example some models are exclusively for code completion or instruction following, not both. I didn't propose a general solution for all providers, just a problem with the current usage of only 1 selected provider+model in CodeGPT 😅

Some code completion models are much smaller than (good) instruction following models, so for useful combinations the memory requirements wouldn't be much bigger (a few GB).

I don't think that 2 models have to be loaded at the same time, they would just have to be switched depending on what the user is doing:

User is coding (document focused, not the chat window)
User is chatting (chat window focused, not a document)
User changes focus -> load model selected for focus

I am missing a server status icon anyway visible at all times (server stopped or running, wasting resources), that could be used perfectly for this proposed focus change action too (stop and start automatically).

May 07 '24 19:05 reneleonhardt

@PhilKes , hello! You wrote:

I tested this by telling CodeGPT's InlineCompletionProvider to also trigger on a lookup, while the code-completions are disabled

I want to setup this trigger for my GodeGPT too. What should I do to trigger CodeGPT's autocomplete, whene I hit Ctrl + Space (oк some other hotkey)?

Jan 24 '25 11:01 ishatalkin