llm-benchmarking topic

List llm-benchmarking repositories

llm4regression

156
Stars
21
Forks
156
Watchers

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

LLM-Research

60
Stars
9
Forks
60
Watchers

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

pint-benchmark

148
Stars
18
Forks
148
Watchers

A benchmark for prompt injection detection systems.

LLMEvaluation

152
Stars
12
Forks
152
Watchers

A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessmen...

fm-leaderboarder

19
Stars
5
Forks
19
Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

MJ-Bench

47
Stars
5
Forks
47
Watchers

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

Awesome-Code-Benchmark

155
Stars
13
Forks
155
Watchers

A comprehensive code domain benchmark review of LLM researches.

enterprise-deep-research

1.0k
Stars
168
Forks
1.0k
Watchers

Salesforce Enterprise Deep Research

confabulations

240
Stars
7
Forks
240
Watchers

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

BizFinBench

209
Stars
7
Forks
209
Watchers

A Business-Driven Real-World Financial Benchmark for Evaluating LLMs