ragas I'm not able to reproduce documentation example

Describe the bug Hello, I'm new using this library and the first that I tried is to replicate the few examples from the documentation.

The example explains that the final dataframe is with this naming:

But the name of columns using test_generator.generate are the following ones:

So when I do evaluate(testset) the expected column naming is uncompatible since it expects question, context, answer ...

I assume that the following mapping at evaluate(...) should be used:

ground_truth_context : context
ground_truth: answer

But also from another example you can find the following structure:

So I'm a bit confused.

Is the ground_truth_context the real context? Or is it ground_truth?
Is my mapping correct? Probably not(?)

Apart from the naming convention confusion, you can see in the traceback that TestDataset hasn't a method to rename_columns.

Based on this examples, I suspect this might be an unintended behavior in the evaluate function, or perhaps the examples are incomplete or documentation isn't updated.

I'd appreciate clarification on this matter before proceeding further with this fantastic library.

Ragas version: 0.0.22 Python version: 3.11.0

Code to Reproduce

import os
os.environ['OPENAI_API_KEY']='example'
from llama_index import download_loader

SemanticScholarReader = download_loader("SemanticScholarReader")
loader = SemanticScholarReader()
# Narrow down the search space
query_space = "large language models"
# Increase the limit to obtain more documents
documents = loader.load_data(query=query_space, limit=10)


from ragas.testset import TestsetGenerator
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from ragas.llms import LangchainLLM

# documents = load your documents

# Add custom llms and embeddings
generator_llm = LangchainLLM(llm=ChatOpenAI(model="gpt-3.5-turbo"))
critic_llm = LangchainLLM(llm=ChatOpenAI(model="gpt-4"))
embeddings_model = OpenAIEmbeddings()

# Change resulting question type distribution
testset_distribution = {
    "simple": 0.25,
    "reasoning": 0.5,
    "multi_context": 0.0,
    "conditional": 0.25,
}

# percentage of conversational question
chat_qa = 0.2

test_generator = TestsetGenerator(
    generator_llm=generator_llm,
    critic_llm=critic_llm,
    embeddings_model=embeddings_model,
    testset_distribution=testset_distribution,
    chat_qa=chat_qa,
)
testset = test_generator.generate(documents, test_size=5)


testset_df= testset.to_pandas()


from ragas.evaluation import evaluate


evaluate(testset)

Error trace Traceback (most recent call last): File "C:\Users\roger.pou_bluetab\PycharmProjects\RAGAS\main.py", line 54, in <module> evaluate(testset) File "C:\Users\roger.pou_bluetab\anaconda3\envs\ragas\Lib\site-packages\ragas\evaluation.py", line 86, in evaluate dataset = remap_column_names(dataset, column_map) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\roger.pou_bluetab\anaconda3\envs\ragas\Lib\site-packages\ragas\validation.py", line 14, in remap_column_names return dataset.rename_columns(inverse_column_map) ^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'TestDataset' object has no attribute 'rename_columns'

Jan 15 '24 15:01 bluetab-roger-p

Hi @bluetab-roger-p , the code was updated for bugfixes and improvements so you won't be able to reproduce exact results. It doesn't really mean that the output you got is wrong, it mostly means that we have fixed/improved something.

Jan 16 '24 16:01 shahules786

First of all thank you for your answer.

I understand this is a WIP and really good job.

But then, I would kindly encourage to update your documentation.

I really think that TestsetGenerator and evaluate functions aren't designed to be fully compatible each other. At least by now, probably due to the recent updates.

Also better instructions should be given on docstrings about the expected columns and parameters of evaluate function.

Jan 17 '24 14:01 bluetab-roger-p

Thank you @bluetab-roger-p , we would love your help in doing so as an opensource project.

Jan 17 '24 21:01 shahules786

@shahules786 you need to take this seriously to avoid going down the langchain path. everybody hates langchain docs because it is broken and not up-to-date. Please do not prioritize new features without having the docs up to date!

The basic tutorials currently return error, for example in https://docs.ragas.io/en/stable/howtos/integrations/llamaindex.html from ragas.llama_index import evaluate -> there is no such module

And even if you fix that you get


    return dataset.rename_columns(inverse_column_map)
           ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'function' object has no attribute 'rename_columns'

from result = evaluate(rag.run, metrics, eval_questions, eval_answers)

Feb 16 '24 06:02 ogencoglu

@ogencoglu that is very true and that is something we don't want to be ourselves. I'm working on updating the docs with the latest concepts and hoping to finish it this week.

as of today however both langchain and llamaindex is broken sadly :( . Since we are a 2 person team its hard to get our hands to everything but I do agree with your point on prioritisation

Feb 19 '24 06:02 jjmachan

Hi all, can anyone help me here? my code worked last week. not running today and throwing the following error: AttributeError: 'dict' object has no attribute 'rename_columns ` i am using the following column names

ds = Dataset.from_dict({"question": eval_questions, "answer": answers, "contexts":contexts, "ground_truth": ground_truth})

fiqa_eval = DatasetDict({"baseline": ds})

result = evaluate(fiqa_eval["baseline"], metrics=metrics, embeddings=dbx_embeddings, llm = chat_model)

worked perfectly fine before. I'd highly appreciate your help!

Mar 25 '24 18:03 lalehsg