llm-evaluation-framework topic

List llm-evaluation-framework repositories

promptfoo

6.9k
Stars
552
Forks
Watchers

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

parea-sdk-py

74
Stars
6
Forks
Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

agentic_security

1.7k
Stars
229
Forks
1.7k
Watchers

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

MixEval

253
Stars
41
Forks
253
Watchers

The official evaluation suite and dynamic data release for MixEval.

KIEval

38
Stars
2
Forks
38
Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

fm-leaderboarder

19
Stars
5
Forks
19
Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

realign

18
Stars
1
Forks
18
Watchers

Realign is a testing and simulation framework for AI applications.

qa_metrics

59
Stars
7
Forks
59
Watchers

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model promp...

contextcheck

91
Stars
12
Forks
91
Watchers

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.