llm-evaluation-framework topics

promptfoo

6.9k

Stars

552

Forks

Watchers

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

promptfoo

llm

llmops

prompt-engineering

prompt-testing

deepeval

9.9k

Stars

860

Forks

Watchers

The LLM Evaluation Framework

confident-ai

evaluation-framework

evaluation-metrics

llm-evaluation

llm-evaluation-framework

parea-sdk-py

74

Stars

6

Forks

Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

parea-ai

generative-ai

good-first-issue

llm

llm-eval

agentic_security

1.7k

Stars

229

Forks

1.7k

Watchers

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

msoedov

llm-fuzzer

llm-fuzzer-aggregator

llm-fuzzing

llm-guardrails

MixEval

253

Stars

41

Forks

253

Watchers

The official evaluation suite and dynamic data release for MixEval.

JinjieNi

benchmark

benchmark-mixture

benchmarking-framework

benchmarking-suite

KIEval

38

Stars

2

Forks

38

Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

zhuohaoyu

acl2024

explainable-ai

llm

llm-evaluation

fm-leaderboarder

19

Stars

5

Forks

19

Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

aws-samples

llm-benchmarking

llm-evaluation

llm-evaluation-framework

realign

18

Stars

1

Forks

18

Watchers

Realign is a testing and simulation framework for AI applications.

honeyhiveai

ai

aiengineering

alignment

evaluation

qa_metrics

59

Stars

7

Forks

59

Watchers

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model promp...

zli12321

exact-matching

llm

llm-evaluation

llm-evaluation-framework