How can I evaluate Ollama models such as Llama 3.3 using RAGAS?
@dosu Can you solve this ?
You can successfully use RAGAS with Ollama models by using the following code:
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
ollama_llm = "mistral"
model = ChatOllama(model=ollama_llm)
embeddings_ollama = OllamaEmbeddings(model="mistral")
This solution was suggested in a similar closed issue and received positive feedback [1].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
I successfully run RAGAS to evaluate an Ollama model. However, I get a timeout error during evaluation and NAN values for metrics, although increasing the timeout for the RunConfig.
Evaluating: 0%| | 0/1 [00:00<?, ?it/s]ERROR:ragas.executor:Exception raised in Job[0]: TimeoutError() {'faithfulness': nan}
**** This is the function I used to calculate the evaluation metrics def RAGAS_metrics(user_query, model_response, contexts_list): # 6.2- Run RAGAS metrics print("\nRunning RAGAS evaluation metrics...") # To dict data = { "question": [user_query], #user_query is string of size 182 "answer": [model_response], #model response is string of size 3361 "contexts": contexts_list, # is a list of the context: [[size=10]]
}
dataset = Dataset.from_dict(data)
results = evaluate(dataset = dataset, metrics=[faithfulness],llm=llm_factory(),
embeddings=embedding_factory(),run_config=RunConfig(max_workers=8,timeout=1000, log_tenacity=True))
print("Evaluation Results from ragas:")
print(results)
sadly this is a duplicate of #1170 today we don't support ollama model but will get this fixed in the coming weeks
I tried to solve this by the below code.Though I only use the embedding of ollama,I suppose that the llm is also the same .
`
from ragas.embeddings import LangchainEmbeddingsWrapper
os.environ['OPENAI_API_KEY'] = 'no-key'
from langchain_ollama import OllamaEmbeddings
ollama_e=OllamaEmbeddings(model="bge-m3",base_url="http://localhost:11434")
ollama_embed_self = LangchainEmbeddingsWrapper(ollama_e)
from ragas.llms import LangchainLLMWrapper
from langchain_community.chat_models import ChatOpenAI
deepseek_llm = ChatOpenAI(
api_key=os.getenv('DEEPSEEK_API_KEY'),
base_url="you_base_url",
model_name="deepseek-chat"
)
wrapper = LangchainLLMWrapper(deepseek_llm)
metrics=[ answer_correctness, answer_similarity, ] for m in metrics: m.setattr("llm",wrapper) if hasattr(m,"embeddings"): m.setattr("embeddings",ollama_embed_self)
naive_results = evaluate( Dataset.from_dict({ "question": questions, "ground_truth": labels, "answer": naive_rag_answers, },
),
metrics=[
# answer_relevancy,
answer_correctness,
answer_similarity,
],
embeddings=ollama_embed_self,
llm=wrapper
) local_results = evaluate( Dataset.from_dict({ "question": questions, "ground_truth": labels, "answer": naive_rag_answers, }), metrics=[ # answer_relevancy, answer_correctness, answer_similarity, ], embeddings=ollama_embed_self, llm=wrapper ) `If the official has perfected the project for this, please inform me of the new usage method.