agent-evaluation topic

List agent-evaluation repositories

giskard-oss

5.1k
Stars
389
Forks
5.1k
Watchers

🐢 Open-Source Evaluation & Testing library for LLM Agents

trulens

3.0k
Stars
242
Forks
3.0k
Watchers

Evaluation and Tracking for LLM Experiments and AI Agents

ai-agents-reality-check

51
Stars
0
Forks
51
Watchers

Mathematical benchmark exposing the massive performance gap between real agents and LLM wrappers. Rigorous multi-dimensional evaluation with statistical validation (95% CI, Cohen's h) and reproducible...

coze-loop

5.3k
Stars
723
Forks
5.3k
Watchers

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to...

awesome-ai-agent-testing

23
Stars
4
Forks
23
Watchers

🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems

eval-view

31
Stars
3
Forks
31
Watchers

Catch AI agent regressions before you ship. YAML test cases, golden baselines, execution tracing, cost tracking, CI integration. LangGraph, CrewAI, Anthropic, OpenAI.

agent-leaderboard

209
Stars
22
Forks
209
Watchers

Ranking LLMs on agentic tasks

Learn How To Observe, Manage, and Scale, Agentic AI Apps Using Azure AI Foundry - with this hands-on workshop

any-agent

1.1k
Stars
83
Forks
1.1k
Watchers

A single interface to use and evaluate different agent frameworks

agentune

34
Stars
3
Forks
34
Watchers

Tune your AI Agent to best meet its KPI with a cyclic process of analyze, improve and simulate