awesome-eval-driven-development
awesome-eval-driven-development copied to clipboard
A curated list of resources, projects, and products to help implement Eval-Driven-Development (EDD) for LLM-backed apps.
Awesome Eval Driven Development (EDD)
Eval-Driven-Development (EDD) is a methodology for guiding the development of LLM-backed apps via a set of task-specific evals (i.e. prompt, context, expected outputs as references).*
These evals guide prompt engineering, model selection, fine-tuning, and so on. We can then run these evals to quickly measure improvements or regressions as the app changes.
It's Test Driven Development (TDD) for LLM-backed apps.
Open-source LLM-backed app evaluation products
| Name | Description |
|---|---|
| Auto Evaluator | Evaluation tool for LLM QA chains |
| DeepEval | Evaluation and Unit Testing for LLMs |
| Evals | A framework for evaluating LLMs and LLM systems |
| Phoenix | Evaluate, troubleshoot, and fine tune your LLM in a notebook |
| Ragas | Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines |
| Uptrain | Your open-source LLM evaluation toolkit |
Paid LLM-backed app evaluation products
| Name | Distribution | Maturity | Self-service signup |
|---|---|---|---|
| Freeplay | SaaS | Private Beta | No |
| Patronus AI | SaaS | Released | No |
References
*- Definition adapted from Patterns for Building LLM-based Systems & Products by Eugene Yan.