awesome-eval-driven-development icon indicating copy to clipboard operation
awesome-eval-driven-development copied to clipboard

A curated list of resources, projects, and products to help implement Eval-Driven-Development (EDD) for LLM-backed apps.

Awesome Eval Driven Development (EDD)

Eval-Driven-Development (EDD) is a methodology for guiding the development of LLM-backed apps via a set of task-specific evals (i.e. prompt, context, expected outputs as references).*

These evals guide prompt engineering, model selection, fine-tuning, and so on. We can then run these evals to quickly measure improvements or regressions as the app changes.

It's Test Driven Development (TDD) for LLM-backed apps.

Open-source LLM-backed app evaluation products

Name Description
Auto Evaluator Evaluation tool for LLM QA chains
DeepEval Evaluation and Unit Testing for LLMs
Evals A framework for evaluating LLMs and LLM systems
Phoenix Evaluate, troubleshoot, and fine tune your LLM in a notebook
Ragas Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
Uptrain Your open-source LLM evaluation toolkit

Paid LLM-backed app evaluation products

Name Distribution Maturity Self-service signup
Freeplay SaaS Private Beta No
Patronus AI SaaS Released No

References

*- Definition adapted from Patterns for Building LLM-based Systems & Products by Eugene Yan.