Feature Request: 🛠Support for local LLMs tools in Terminal Chat like Ollama
Description of the new feature/enhancement
The Windows Terminal Chat currently only supports Azure OpenAI Service. This restriction limits developers who work with or are developing their own local Large Language Models (LLMs), or using tools such as Ollama and need to interface with them directly within the Terminal. The ability to connect to a local LLM service would allow for better flexibility, especially for those concerned with privacy, working offline, or dealing with sensitive information that cannot be sent to cloud services.
Proposed technical implementation details (optional)
include functionality to support local LLM services by allowing users to configure a connection to local AI models. This would involve:
- Provide an option in the Terminal Chat settings to specify the endpoint of a local LLM service.
- Allowing the user to set the port that the local LLM service should listen to for incoming requests.
Thanks!
Would love to see this feature. Phi models would be great for this.
As a workaround, I setup https://github.com/g0t4/term-chat-ollama as an intermediate "proxy" that can forward requests to any OpenAI compat completions backend... i.e. ollama, OpenAI, groq.com, etc
FYI, video overview here: https://youtu.be/-QcSRmrsND0
@dossjj with this, you can use phi3 by setting the endpoint to https://fake.openai.azure.com:5000/answer?model=phi3
Please just implement this as an OpenAI Compatiable interface.
- Ollama now supports this since Feb 2024
- LiteLLM is used as a proxy to many hosted backends for businesses
- Anthorpic models can be put behind the LiteLLM and invoked this way
They simply require these 3 peices of information:
OPENAPI_BASE_URL="http://some-endpoint/api" OPENAPI_KEY="12345" OPENAPI_MODEL="gpt-4o"
Just wanted to add my $.02.
I decided to try a POC and manually updated the hard-coded OpenAI URL and Model here: src/cascadia/QueryExtension/OpenAILLMProvider.cpp to my local ollama OpenAI endpoint and a model I already had, and did a build. I was able to use the chat feature perfectly with Ollama as the back-end. To flesh out the request in a bit more detail...
- Make the
openAIEndpointuser-configurable - at least as a base URL (e.g. https://ollama.mydoman.com or http://localhost:11434). Up to you whether or not you want to hard-code the/v1part of the full URI, or have the user include it in the input.
- Personally, I'd probably lean towards the hard-coding of that bit as most 3rd-party LLM apps that have an OpenAI-compliant API also include the
/v1in their custom APIs
-
Make the OpenAI API token an optional field. It's not required for Ollama and many others, though if they don't have API authorization configured, they tend to just ignore the authorization header entirely.
-
Instead of hard-coding a specific model, use the OpenAI
List ModelsAPI endpoint to populate what models are available, then let the user select the model from a drop-down menu. The endpoint would be a GET request toOPENAI_BASE_URL/v1/models(e.g. https://api.openai.com/v1/models)
- In addition to being a nice element to have customizable, the actual OpenAI API endpoint for this request requires a valid authentication header, so it's an easy way to validate the API key at the same time. Ref: https://platform.openai.com/docs/api-reference/models/list
@zadjii-msft - Hope this helps!
PS: Implementation as-described for part 3 above would also resolve https://github.com/microsoft/terminal/issues/18200