data-formulator The Endpoint of OpenAI model is not allow to change, how should I add 3rd party LLMs?

I notice that I can modify some files in the src folder, but this still can't work when I use data_formulator. So can you provide the users the oppertunity to use 3rd party endpoints ?

Nov 05 '24 05:11 freemank1224

In order to be able to modify source code and reflect it in data formulator, you should follow the development build (instead of using pip install data_formulator).

You can check out the development build here https://github.com/microsoft/data-formulator/blob/main/DEVELOPMENT.md

You'll need to complete both backend build and frontend build, and visit http://localhost:3000/ to see the live version.

Some tips: if you made updates to backend, you'll need to restart the backend server with .\local_server.sh to reflect changes to the custom data formulator you are developing.

Nov 05 '24 16:11 Chenglong-MS

Where in the app can i change the endpoint e.g. to use an OLLAMA model?

Feb 10 '25 13:02 pmccabe157

i would love to see the Ollama as well as back end please

Feb 10 '25 14:02 ramixpe

Here is the place to update endpoint to use other models. https://github.com/microsoft/data-formulator/blob/main/py-src/data_formulator/agents/client_utils.py

There seems to be enough interests? We can update data formulator to support Ollama as part of the new version release.

Feb 10 '25 18:02 Chenglong-MS

I would love to see a way to use other LLM. And possibly change the API URL. And also to use it locally with Ollama.

Feb 10 '25 18:02 82ndAirborneDiv

Yeah, that sounds great. I also find Ollama a good addition given that open source models have much better performance nowadays.

Community -- for closed source models, if you all have suggestions on priorities, feel free to make suggestions. We could potentially start with claude models or hugging face keys.

Feb 10 '25 19:02 Chenglong-MS

Azure OpenAI would be great (and Microsoft related 😊)

Feb 11 '25 17:02 alikalik9

To enable seamless integration with additional LLMs. we propose updating the LLM client initialization in client_utils.py to accept custom endpoints and model parameters using LiteLLM’s unified interface. Additionally, there introduce a user-friendly UI settings panel where users can input the Endpoint URL (e.g., http://localhost:11434 for Ollama), API Key and Model Name . This enhancement empowers users to switch between OpenAI, local models, or third-party providers.

export LLM_ENDPOINT="http://localhost:11434"(ollama) defult: export LLM_MODEL="llama2" export LLM_API_KEY=""

Example LiteLLM configuration:

from litellm import completion 
import os

def get_llm_client():
    custom_endpoint = os.getenv("LLM_ENDPOINT", "https://api.openai.com/v1")  
    api_key = os.getenv("LLM_API_KEY", "")
    model = os.getenv("LLM_MODEL", "gpt-4")  # Default 

    if "ollama" in custom_endpoint.lower():
        model = f"ollama/{model}"  

    return completion(
        model=model,
        api_base=custom_endpoint,
        api_key=api_key,
        custom_llm_provider="ollama" if "ollama" in custom_endpoint else None
    )

Feb 11 '25 18:02 Chittaranjans

yes, this sounds like a very good approach. We'll test it out and do a PR soon.

Feb 11 '25 18:02 Chenglong-MS

OKay

Feb 11 '25 19:02 Chittaranjans

A work in progress supporting non-openai models is here: https://github.com/microsoft/data-formulator/tree/dev

I have tested with Ollama/OpenAI/AzureOpenAI, would do a bit more testing before doing a PR.

Checkout client support here: https://github.com/microsoft/data-formulator/blob/dev/py-src/data_formulator/agents/client_utils.py Frontend update can be tracked at: https://github.com/microsoft/data-formulator/blob/dev/src/views/ModelSelectionDialog.tsx

Feb 12 '25 02:02 Chenglong-MS

What i planned and tested , It work for all LLM , only the UI part for modelselection will have to do accordingly with the client_utils.py.

Feb 12 '25 03:02 Chittaranjans

Checkout this PR: https://github.com/microsoft/data-formulator/pull/81, should work now! Will merge to main soon.

I have updated throughout (UI, agents, utils) so that Data Formulator can work with custom models.

So far by experience is that modes with good code generation and instruction following capabilities work best (gpt-4o, gpt-4o-mini, claude-3-5-sonnet etc).

Small local models (llama3.2, qwen2.5-coder:3b, codellama:7b) tends to ignore instructions to generate code, and can fail frequently on data formulation steps.

Feb 12 '25 23:02 Chenglong-MS

You need to incorporate third-party LLMs without changing OpenAI’s fixed endpoint, you can use a middleware to route requests based on task requirements, call different LLMs selectively within your app, run local models if supported, or combine responses from multiple LLMs for richer output.

Feb 13 '25 00:02 giyosphere

Please check out the new release with custom model support: https://github.com/microsoft/data-formulator/releases/tag/0.1.5

@giyosphere made a good point. It would be good to batch some queries and adaptively choose model based on task requirement (different agent may have more suitable models). That would be a feature improvement.

Feb 13 '25 00:02 Chenglong-MS

Hello,

At my company, we use OpenWebUI, which provides a chatbot GUI and exposes LLM APIs in both OpenAI and Ollama formats. Additionally, it allows us to secure these endpoints with an API key.

However, I’ve encountered issues configuring MS DataFormulator with both the Ollama and OpenAI endpoints:

The Ollama endpoint does not support setting an API key.
The OpenAI endpoint does not allow changing the base URL.

Would it be possible to either:

Enable API key support for the Ollama endpoint, or
Allow customization of the base URL for the OpenAI endpoint?

This would greatly help in integrating MS DataFormulator with OpenWebUI.

Thank you for your consideration!

Best regards,

Checkout this PR: #81, should work now! Will merge to main soon.

I have updated throughout (UI, agents, utils) so that Data Formulator can work with custom models.

So far by experience is that modes with good code generation and instruction following capabilities work best (gpt-4o, gpt-4o-mini, claude-3-5-sonnet etc).

Small local models (llama3.2, qwen2.5-coder:3b, codellama:7b) tends to ignore instructions to generate code, and can fail frequently on data formulation steps.

Feb 19 '25 15:02 banalg