graphrag Add LiteLLM - support for Ollama, Vertex AI, Gemini, Anthropic, Bedrock (100+LLMs)

Description

This PR adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/ LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-4o.

Resolves https://github.com/microsoft/graphrag/issues/657

Example

from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-cohere-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="openai/gpt-4o", messages=messages)

# anthropic call
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print(response)

Response (OpenAI Format)

{
    "id": "chatcmpl-565d891b-a42e-4c39-8d14-82a1f5208885",
    "created": 1734366691,
    "model": "claude-3-sonnet-20240229",
    "object": "chat.completion",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "usage": {
        "completion_tokens": 43,
        "prompt_tokens": 13,
        "total_tokens": 56,
        "completion_tokens_details": null,
        "prompt_tokens_details": {
            "audio_tokens": null,
            "cached_tokens": 0
        },
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
    }
}

[Provide a brief description of the changes made in this pull request.]

Related Issues

https://github.com/microsoft/graphrag/issues/657

[Reference any related issues or tasks that this pull request addresses.]

Proposed Changes

[List the specific changes made in this pull request.]

Checklist

[x] I have tested these changes locally.
[ ] I have reviewed the code changes.
[ ] I have updated the documentation (if necessary).
[ ] I have added appropriate unit tests (if applicable).

Additional Notes

[Add any additional notes or context that may be helpful for the reviewer(s).]

Jan 01 '25 17:01 ishaan-jaff

Can I get a review on this @natoverse ?

Jan 01 '25 17:01 ishaan-jaff

This worked for me with ollama! I had to change the model from gemma2:27b to ollama/gemma2:27b and also pip install litellm to make it work.

EDIT: I needed to use gemma2:27b for indexing and ollama/gemma2:27b for querying, not sure why.

Jan 13 '25 08:01 timjrobinson

Hey @timjrobinson @ishaan-jaff

Thanks a lot for this feature, it's really helpful!

I'd like to try it out with Ollama or Gemini.

However, I'm struggling with the configuration. I'm unsure about which parameters I need to change to use my Ollama instance or the Gemini model. I'm encountering errors.

Here are the tests I've done so far:

My setting.yaml file:

llm:
  api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
  type: openai_chat  # or azure_openai_chat
  model: ollama/gemma2:27b
  model_supports_json: true # recommended if this is available for your model.

I'm still getting errors in all cases. There are mostrly error from LLM call saying the OPENAPI_KEY is in the wrong format or errors like this one:

Here's an example of the output:

{
  "type": "error",
  "data": "Entity Extraction Error",
  "stack": "...",
  "source": "404 page not found",
  "details": {
    "doc_index": 0,
    "text": "..."
  }
}

I've tried replacing the model with gemini/gemini and ensured my API key is set in the .env file, but I'm still getting errors whenever I run the script with graphrag index --root ./ragtest/.

Any help would be appreciated 😄 !

Jan 27 '25 16:01 GridexX

Does this work during indexing?

Feb 04 '25 13:02 todap

This is a very good feature, would love to see this merged, if everything's good.

Mar 22 '25 21:03 sskop99

@natoverse Gentle nudge for review.

Mar 22 '25 21:03 sskop99

@ishaan-jaff I can't even find litellm in pyproject.toml or poetry.lock... does this new import work? 😅 Could you rebase or resolve the conflicts so the tests can run again?

Apr 29 '25 05:04 reneleonhardt

Hey @timjrobinson @ishaan-jaff

Thanks a lot for this feature, it's really helpful!

I'd like to try it out with Ollama or Gemini.

However, I'm struggling with the configuration. I'm unsure about which parameters I need to change to use my Ollama instance or the Gemini model. I'm encountering errors.

Here are the tests I've done so far:

My setting.yaml file:
llm:
  api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
  type: openai_chat  # or azure_openai_chat
  model: ollama/gemma2:27b
  model_supports_json: true # recommended if this is available for your model.
I'm still getting errors in all cases. There are mostrly error from LLM call saying the OPENAPI_KEY is in the wrong format or errors like this one:

Here's an example of the output:
{
  "type": "error",
  "data": "Entity Extraction Error",
  "stack": "...",
  "source": "404 page not found",
  "details": {
    "doc_index": 0,
    "text": "..."
  }
}
I've tried replacing the model with gemini/gemini and ensured my API key is set in the .env file, but I'm still getting errors whenever I run the script with graphrag index --root ./ragtest/.

Any help would be appreciated 😄 !

@timjrobinson @ishaan-jaff** is it possible to provide a Gemini config example ?

May 20 '25 20:05 haimco50

Thanks for the suggestion - we are adding LiteLLm support with #2051, but needed to also include a number of rate limiting and retry features to support larger pipeline runs.

Sep 19 '25 16:09 natoverse