Awesome Eval Driven Development (EDD)

Eval-Driven-Development (EDD) is a methodology for guiding the development of LLM-backed apps via a set of task-specific evals (i.e. prompt, context, expected outputs as references).*

These evals guide prompt engineering, model selection, fine-tuning, and so on. We can then run these evals to quickly measure improvements or regressions as the app changes.

It's Test Driven Development (TDD) for LLM-backed apps.

Open-source LLM-backed app evaluation products

Name	Description
Auto Evaluator	Evaluation tool for LLM QA chains
DeepEval	Evaluation and Unit Testing for LLMs
Evals	A framework for evaluating LLMs and LLM systems
Phoenix	Evaluate, troubleshoot, and fine tune your LLM in a notebook
Ragas	Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
Uptrain	Your open-source LLM evaluation toolkit

Paid LLM-backed app evaluation products

Name	Distribution	Maturity	Self-service signup
Freeplay	SaaS	Private Beta	No
Patronus AI	SaaS	Released	No

References

*- Definition adapted from Patterns for Building LLM-based Systems & Products by Eugene Yan.

awesome-eval-driven-development
awesome-eval-driven-development copied to clipboard

Metadata

Awesome Eval Driven Development (EDD)

Open-source LLM-backed app evaluation products

Paid LLM-backed app evaluation products

References

← Metadata

Owner

Metadata

awesome-eval-driven-development awesome-eval-driven-development copied to clipboard

Metadata

Awesome Eval Driven Development (EDD)

Open-source LLM-backed app evaluation products

Paid LLM-backed app evaluation products

References

← Metadata

Owner

Metadata

awesome-eval-driven-development
awesome-eval-driven-development copied to clipboard