Add LiteLLM - support for Ollama, Vertex AI, Gemini, Anthropic, Bedrock (100+LLMs)
Description
This PR adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/ LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-4o.
- Resolves https://github.com/microsoft/graphrag/issues/657
Example
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="openai/gpt-4o", messages=messages)
# anthropic call
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print(response)
Response (OpenAI Format)
{
"id": "chatcmpl-565d891b-a42e-4c39-8d14-82a1f5208885",
"created": 1734366691,
"model": "claude-3-sonnet-20240229",
"object": "chat.completion",
"system_fingerprint": null,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"usage": {
"completion_tokens": 43,
"prompt_tokens": 13,
"total_tokens": 56,
"completion_tokens_details": null,
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0
},
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}
[Provide a brief description of the changes made in this pull request.]
Related Issues
https://github.com/microsoft/graphrag/issues/657
[Reference any related issues or tasks that this pull request addresses.]
Proposed Changes
[List the specific changes made in this pull request.]
Checklist
- [x] I have tested these changes locally.
- [ ] I have reviewed the code changes.
- [ ] I have updated the documentation (if necessary).
- [ ] I have added appropriate unit tests (if applicable).
Additional Notes
[Add any additional notes or context that may be helpful for the reviewer(s).]
Can I get a review on this @natoverse ?
This worked for me with ollama! I had to change the model from gemma2:27b to ollama/gemma2:27b and also pip install litellm to make it work.
EDIT: I needed to use gemma2:27b for indexing and ollama/gemma2:27b for querying, not sure why.
Hey @timjrobinson @ishaan-jaff
Thanks a lot for this feature, it's really helpful!
I'd like to try it out with Ollama or Gemini.
However, I'm struggling with the configuration. I'm unsure about which parameters I need to change to use my Ollama instance or the Gemini model. I'm encountering errors.
Here are the tests I've done so far:
My setting.yaml file:
llm:
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
type: openai_chat # or azure_openai_chat
model: ollama/gemma2:27b
model_supports_json: true # recommended if this is available for your model.
I'm still getting errors in all cases. There are mostrly error from LLM call saying the OPENAPI_KEY is in the wrong format or errors like this one:
Here's an example of the output:
{
"type": "error",
"data": "Entity Extraction Error",
"stack": "...",
"source": "404 page not found",
"details": {
"doc_index": 0,
"text": "..."
}
}
I've tried replacing the model with gemini/gemini and ensured my API key is set in the .env file, but I'm still getting errors whenever I run the script with graphrag index --root ./ragtest/.
Any help would be appreciated 😄 !
Does this work during indexing?
This is a very good feature, would love to see this merged, if everything's good.
@natoverse Gentle nudge for review.
@ishaan-jaff I can't even find litellm in pyproject.toml or poetry.lock... does this new import work? 😅 Could you rebase or resolve the conflicts so the tests can run again?
Hey @timjrobinson @ishaan-jaff
Thanks a lot for this feature, it's really helpful!
I'd like to try it out with Ollama or Gemini.
However, I'm struggling with the configuration. I'm unsure about which parameters I need to change to use my Ollama instance or the Gemini model. I'm encountering errors.
Here are the tests I've done so far:
My
setting.yamlfile:llm: api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file type: openai_chat # or azure_openai_chat model: ollama/gemma2:27b model_supports_json: true # recommended if this is available for your model.I'm still getting errors in all cases. There are mostrly error from LLM call saying the OPENAPI_KEY is in the wrong format or errors like this one:
Here's an example of the output:
{ "type": "error", "data": "Entity Extraction Error", "stack": "...", "source": "404 page not found", "details": { "doc_index": 0, "text": "..." } }I've tried replacing the model with
gemini/geminiand ensured my API key is set in the.envfile, but I'm still getting errors whenever I run the script withgraphrag index --root ./ragtest/.Any help would be appreciated 😄 !
@timjrobinson @ishaan-jaff** is it possible to provide a Gemini config example ?
Thanks for the suggestion - we are adding LiteLLm support with #2051, but needed to also include a number of rate limiting and retry features to support larger pipeline runs.