RAGAS TimeoutError Exception while Evaluation
I have a self-created data set with questions, GT, contexts and answers and have started the evaluation with the RAGAS evaluate() method. The percentage increases a few points and then stops. Then I get several identical errors. Then the evaluation continues and freezes again after a few minutes.
Ragas version: 0.1.13 Python version: 3.10.13
Code to Reproduce
import pandas as pd
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
context_utilization,
context_entity_recall,
answer_correctness,
answer_similarity,
)
from ragas.metrics.critique import harmfulness
from dotenv import load_dotenv
load_dotenv()
df = pd.read_excel('testset_full-200-ES.xlsx')
# Create the 'data_samples' dictionary structure
data_samples = {
'question': df['question'].tolist(),
'answer': df['answer'].tolist(),
'contexts': df['contexts'].apply(lambda x: [x] if pd.notna(x) else []).tolist(),
'ground_truth': df['ground_truth'].tolist()
}
dataset = Dataset.from_dict(data_samples)
result = evaluate(
dataset,
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
context_utilization,
context_entity_recall,
answer_correctness,
answer_similarity,
harmfulness,
],
)
df = result.to_pandas()
print(result)
# save evaluation results to csv
df.to_csv('results_es.csv', index=False)
Error trace
Evaluating: 11%|█ | 1709/15453 [14:40<330:40:17, 86.61s/it]Exception raised in Job[13246]: TimeoutError()
Exception raised in Job[13243]: TimeoutError()
Exception raised in Job[5118]: TimeoutError()
Exception raised in Job[13245]: TimeoutError()
Exception raised in Job[5125]: TimeoutError()
Exception raised in Job[13241]: TimeoutError()
Exception raised in Job[5126]: TimeoutError()
Exception raised in Job[13227]: TimeoutError()
Exception raised in Job[5122]: TimeoutError()
Exception raised in Job[13244]: TimeoutError()
Exception raised in Job[13240]: TimeoutError()
Exception raised in Job[13236]: TimeoutError()
Exception raised in Job[5113]: TimeoutError()
Exception raised in Job[5124]: TimeoutError()
Exception raised in Job[14126]: TimeoutError()
Evaluating: 13%|█▎ | 2063/15453 [33:06<750:37:32, 201.81s/it]Exception raised in Job[8734]: TimeoutError()
Exception raised in Job[8731]: TimeoutError()
Exception raised in Job[4239]: TimeoutError()
Exception raised in Job[8727]: TimeoutError()
Exception raised in Job[1626]: TimeoutError()
Exception raised in Job[12353]: TimeoutError()
Exception raised in Job[4231]: TimeoutError()
Exception raised in Job[12354]: TimeoutError()
Exception raised in Job[4235]: TimeoutError()
Exception raised in Job[12358]: TimeoutError()
Exception raised in Job[4236]: TimeoutError()
Exception raised in Job[8732]: TimeoutError()
Exception raised in Job[1625]: TimeoutError()
Exception raised in Job[1630]: TimeoutError()
Exception raised in Job[1617]: TimeoutError()
Hey @kaan9700! How can I assist you today?
I found some similar issues that might help you resolve the TimeoutError when using the evaluate() function in the ragas library.
-
Timeout Configuration: You need to set the
timeoutparameter in theRunConfigand ensure it is passed to the evaluation function. Here is an example of how to do it:from ragas import evaluate, RunConfig from ragas.metrics import ( answer_relevancy, faithfulness, context_recall, context_precision, context_utilization, context_entity_recall, answer_correctness, answer_similarity, ) from ragas.metrics.critique import harmfulness from datasets import Dataset import pandas as pd # Load your dataset df = pd.read_excel('testset_full-200-ES.xlsx') data_samples = { 'question': df['question'].tolist(), 'answer': df['answer'].tolist(), 'contexts': df['contexts'].apply(lambda x: [x] if pd.notna(x) else []).tolist(), 'ground_truth': df['ground_truth'].tolist() } dataset = Dataset.from_dict(data_samples) # Create a RunConfig with a specified timeout run_config = RunConfig(timeout=120) # Set timeout to 120 seconds # Use the evaluate function with the custom run_config result = evaluate( dataset, metrics=[ context_precision, faithfulness, answer_relevancy, context_recall, context_utilization, context_entity_recall, answer_correctness, answer_similarity, harmfulness, ], run_config=run_config ) df = result.to_pandas() print(result) df.to_csv('results_es.csv', index=False) -
Known Issues and Limitations: There are known issues with using certain configurations in the
ragaslibrary, such as the lack of support for asynchronous callbacks and specific configurations required for different types of models [1]. -
Debugging and Logging: Enable debugging and logging options to diagnose the
TimeoutError. Set thelog_tenacityparameter inRunConfigtoTrueand enable theRAGAS_DEBUGenvironment variable:import os from ragas import RunConfig os.environ["RAGAS_DEBUG"] = "true" run_config = RunConfig(timeout=120, log_tenacity=True)This will provide comprehensive logging and debugging information [1].
By following these steps, you should be able to resolve the TimeoutError and successfully use the evaluate() function with your dataset.
To continue talking to Dosu, mention @dosu.
The same error. Have you fixed it?
+1
@CraigMChen does it happen with timeout too?
@jjmachan
Here's my code:
from ragas.llms import LangchainLLMWrapper
from ragas import evaluate
from langchain.chat_models import ChatOllama
from ragas.metrics import (
faithfulness,
answer_relevancy,
context_recall,
context_precision,
)
llm = ChatOllama(model="llama3.2", base_url="http://127.0.0.1:11434", request_timeout=3000)
print(llm.invoke("hello")) # ok
result = evaluate(
llm=LangchainLLMWrapper(llm),
dataset=dataset,
metrics=[
context_precision,
context_recall,
faithfulness,
answer_relevancy,
],
)
df = result.to_pandas()
Error:
Exception raised in Job[0]: TimeoutError()
Exception raised in Job[3]: TimeoutError()
Exception raised in Job[2]: TimeoutError()
Exception raised in Job[1]: TimeoutError()
ohh I see you're using ollama - we don't support that yet #1170
but we also have a batching option in evaluate() which could help for others
It seems the issue was answered, closing this now.