haystack
haystack copied to clipboard
Design and implement scaffolding for pipeline evaluation
Implement scaffolding code that:
- Accepts...
- The evaluated pipeline, i.e., the pipeline whose output is to be evaluated.
- A set of inputs for the above pipeline.
- The evaluation pipeline, i.e., the one with the evaluation components/metrics.
- A set of additional inputs for the evaluation pipeline, e.g: labels, etc.
- Runs...
- The evaluated pipeline with the above inputs.
- Optionally allows overriding parameters of specific components in said pipeline.
- The evaluation pipeline with the outputs of the above pipeline.
- The evaluated pipeline with the above inputs.
- Returns...
- The results of the evaluation pipeline.
The scaffold will further allow specialization for individual use cases. For instance, an end-to-end RAG pipeline evaluation harness can be built on top of it, implementing a RAG-specific API.
Related to #7415.