haystack
haystack copied to clipboard
LLM Eval - Implement Faithfulness/Factual Accuracy metric
Depends on https://github.com/deepset-ai/haystack/issues/7022.
Wrap LLMEvaluator to provide a component that calculates the "Faithfulness" or "Factual accuracy" based on the following inputs:
- Questions
- Contexts
- Responses
This component is meant to be plug-n-play, meaning it will provide a good enough starting prompt and examples. These should also be customizable by the user.
A requirement for this component is that the LLM is expected to return a binary value for each input tuple. This will let us calculate a final score for the dataset ourselves.