qwed-verification icon indicating copy to clipboard operation
qwed-verification copied to clipboard

Deterministic verification layer for LLMs | AI hallucination detection | Model output validation | Formal verification for AI | Python ๐Ÿ

QWED Logo - AI Verification Engine

QWED Protocol

The Deterministic Verification Layer for AI

QWED Verification - Production-grade deterministic verification layer for Large Language Models (LLMs). Detect and prevent AI hallucinations through 8 specialized verification engines. Open-source Python framework for AI safety, LLM accuracy testing, and model output validation.

Don't fix the liar. Verify the lie.
QWED does not reduce hallucinations. It makes them irrelevant.

If an AI output cannot be proven, QWED will not allow it into production.

CI License Python 3.10+ Docker DOI status PyPI version Contributors

GitHub stars GitHub forks GitHub watchers


Twitter LinkedIn Blog


Quick Start ยท The Problem ยท The 8 Engines ยท FAQ ยท ๐Ÿ“„ Whitepaper ยท ๐Ÿ“š Docs

โš ๏ธ What QWED Is (and Isn't)

QWED is: An open-source engineering tool that combines existing verification libraries (SymPy, Z3, SQLGlot, AST) into a unified API for LLM output validation.

QWED is NOT: Novel research. We don't claim algorithmic innovation. We claim practical integration for production use cases.

Works when: Developer provides ground truth (expected values, schemas, contracts) and LLM generates structured output.

Doesn't work when: Specs come from natural language, outputs are freeform text, or verification domain is unsupported.


๐Ÿš€ Quick Start: Install & Verify in 30 Seconds

# Install from PyPI (Recommended)
pip install qwed

# Or install from source
git clone https://github.com/QWED-AI/qwed-verification.git
cd qwed-verification
pip install -e .
from qwed_sdk import QWEDClient

client = QWEDClient(api_key="your_key")

# The LLM says: "Derivative of x^2 is 3x" (Hallucination!)
response = client.verify_math(
    query="What is the derivative of x^2?",
    llm_output="3x" 
)

print(response)
# -> โŒ CORRECTED: The derivative is 2x. (Verified by SymPy)

๐Ÿšจ The LLM Hallucination Problem: Why AI Can't Be Trusted

Everyone is trying to fix AI hallucinations by Fine-Tuning (teaching it more data).

This is like forcing a student to memorize 1,000,000 math problems.

What happens when they see the 1,000,001st problem? They guess.


๐Ÿ“Š The Proof: Why Enterprise AI Needs QWED Verification

We benchmarked Claude Opus 4.5 (one of the world's best LLMs) on 215 critical tasks.

QWED Benchmark Results - LLM Accuracy Testing

Finding Implication
Finance: 73% accuracy Banks can't use raw LLM for calculations
Adversarial: 85% accuracy LLMs fall for authority bias tricks
QWED: 100% error detection All 22 errors caught before production

QWED doesn't compete with LLMs. We ENABLE them for production use.

๐Ÿ“„ Full Benchmark Report โ†’


๐ŸŽฏ Use Cases & Applications

QWED is designed for industries where AI errors have real consequences:

Industry Use Case Risk Without QWED
๐Ÿฆ Financial Services Transaction validation, fraud detection $12,889 error per miscalculation
๐Ÿฅ Healthcare AI Drug interaction checking, diagnosis verification Patient safety risks
โš–๏ธ Legal Tech Contract analysis, compliance checking Regulatory violations
๐Ÿ“š Educational AI AI tutoring, assessment systems Misinformation to students
๐Ÿญ Manufacturing Process control, quality assurance Production defects

โœ… The Solution: Give the AI a Calculator

QWED doesn't try to make the LLM "smarter".

It treats the LLM as an untrusted translator and verifies its output using Deterministic Engines (SymPy, Z3, SQLGlot, AST).

"If an AI writes code, QWED runs the security audit.
If an AI does math, QWED runs the calculus."

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  User Query  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ LLM (The Guesser)โ”‚
โ”‚ GPT-4 / Claude   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ Unverified Output
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  QWED Protocol     โ”‚
โ”‚  (Verification)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
   โ”Œโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
   โ–ผ        โ–ผ
โŒ Reject  โœ… Verified
            โ”‚
            โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚ Your Applicationโ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

QWED ๐Ÿ†š Traditional AI Safety Approaches

Approach Accuracy Deterministic Explainable Best For
QWED Verification โœ… 99%+ โœ… Yes โœ… Full trace Production AI
Fine-tuning / RLHF โš ๏ธ ~85% โŒ No โŒ Black box General improvement
RAG (Retrieval) โš ๏ธ ~80% โŒ No โš ๏ธ Limited Knowledge grounding
Prompt Engineering โš ๏ธ ~70% โŒ No โš ๏ธ Limited Quick fixes
Guardrails โš ๏ธ Variable โŒ No โš ๏ธ Reactive Content filtering

QWED doesn't replace these - it complements them with mathematical certainty.


๐Ÿ”ง The 8 Verification Engines: How QWED Validates LLM Outputs

We don't use another LLM to check your LLM. That's circular logic.

We use Hard Engineering:

Engine Tech Stack What it Solves
๐Ÿงฎ Math Verifier SymPy + NumPy Calculus, Linear Algebra, Finance. No more $1 + $1 = $3.
โš–๏ธ Logic Verifier Z3 Prover Formal Verification. Checks for logical contradictions.
๐Ÿ›ก๏ธ Code Security AST + Semgrep Catches eval(), secrets, vulnerabilities before code runs.
๐Ÿ“Š Stats Engine Pandas + Wasm Sandboxed execution for trusted data analysis.
๐Ÿ—„๏ธ SQL Validator SQLGlot Prevents Injection & validates schema.
๐Ÿ” Fact Checker TF-IDF + NLI Checks grounding against source docs.
๐Ÿ‘๏ธ Image Verifier OpenCV + Metadata Verifies image dimensions, format, pixel data.
๐Ÿค Consensus Engine Multi-Provider Cross-checks GPT-4 vs Claude vs Gemini.

๐Ÿง  The QWED Philosophy: Verification Over Correction

โŒ Wrong Approach โœ… QWED Approach
"Let's fine-tune the model to be more accurate" "Let's verify the output with math"
"Trust the AI's confidence score" "Trust the symbolic proof"
"Add more training data" "Add a verification layer"
"Hope it doesn't hallucinate" "Catch hallucinations deterministically"

QWED = Query with Evidence and Determinism

Probabilistic systems should not be trusted with deterministic tasks. If it can't be verified, it doesn't ship.


๐Ÿ”Œ LLM Framework Integrations

Already using an Agent framework? QWED drops right in.

๐Ÿฆœ LangChain

from qwed_sdk.langchain import QWEDTool

tools = [QWEDTool(verification_type="math"), QWEDTool(verification_type="sql")]

๐Ÿค– CrewAI

from qwed_sdk.crewai import QWEDVerifiedAgent

agent = QWEDVerifiedAgent(role="Analyst", allow_dangerous_code=False)

๐ŸŒ Multi-Language SDK Support

Language Package Status
๐Ÿ Python qwed โœ… Available on PyPI
๐ŸŸฆ TypeScript @qwed-ai/sdk โœ… Available on npm
๐Ÿน Go qwed-go ๐ŸŸก Coming Soon
๐Ÿฆ€ Rust qwed ๐ŸŸก Coming Soon

git clone https://github.com/QWED-AI/qwed-verification.git cd qwed-verification pip install -r requirements.txt


---

## ๐ŸŽฏ Real Example: The $12,889 Bug

**User asks AI:** "Calculate compound interest: $100K at 5% for 10 years"

**GPT-4 responds:** "$150,000"  
*(Used simple interest by mistake)*

**With QWED:**
```python
response = client.verify_math(
    query="Compound interest: $100K, 5%, 10 years",
    llm_output="$150,000"
)
# -> โŒ INCORRECT: Expected $162,889.46
#    Error: Used simple interest formula instead of compound

Cost of not verifying: $12,889 error per transaction ๐Ÿ’ธ


โ“ Frequently Asked Questions

Q: How does QWED differ from RAG (Retrieval Augmented Generation)?

A: RAG improves the input to the LLM by grounding it in documents. QWED verifies the output deterministically. RAG adds knowledge; QWED adds certainty.

Q: Can QWED work with any LLM?

A: Yes! QWED is model-agnostic and works with GPT-4, Claude, Gemini, Llama, Mistral, and any other LLM. We verify outputs, not models.

Q: Does QWED replace fine-tuning?

A: No. Fine-tuning makes models better at tasks. QWED verifies they got it right. Use both.

Q: Is QWED open source?

A: Yes! Apache 2.0 license. Enterprise features (audit logs, multi-tenancy) are in a separate repo.

Q: What's the latency overhead?

A: Typically <100ms for most verifications. Math and logic proofs are instant. Consensus checks take longer (multiple API calls).


๐Ÿ“š Documentation & Resources

Resource Description
๐Ÿ“– Full Documentation Complete API reference and guides
๐Ÿ”ง API Reference Endpoints and schemas
๐Ÿ“Š Benchmarks LLM accuracy testing results
๐Ÿค Contributing Guide How to contribute to QWED
๐Ÿ—๏ธ Architecture System design and engine internals
๐Ÿ”’ Security Policy Reporting vulnerabilities

๐Ÿข Enterprise Features

Need observability, multi-tenancy, audit logs, or compliance exports?

๐Ÿ“ง Contact: [email protected]


๐Ÿ“„ License

Apache 2.0 - See LICENSE


โญ Star History

Star History Chart

If chart doesn't load, click here for alternatives

Current Stars: GitHub stars

View trend: Star History Page


๐Ÿ‘ฅ Contributors

QWED Contributors

๐Ÿ“„ Citation

If you use QWED in your research or project, please cite our archived paper:

@software{dass2025qwed,
  author = {Dass, Rahul},
  title = {QWED Protocol: Deterministic Verification for Large Language Models},
  year = {2025},
  publisher = {Zenodo},
  version = {v1.0.0},
  doi = {10.5281/zenodo.18110785},
  url = {https://doi.org/10.5281/zenodo.18110785}
}

Plain text:

Dass, R. (2025). QWED Protocol: Deterministic Verification for Large Language Models (Version v1.1.0). Zenodo. https://doi.org/10.5281/zenodo.18110785


โœ… Using QWED in Your Project?

Add this badge to your README to show you're using verified AI:

[![Verified by QWED](https://img.shields.io/badge/Verified_by-QWED-00C853?style=flat&logo=checkmarx)](https://github.com/QWED-AI/qwed-verification)

Preview:
Verified by QWED

This badge tells users that your LLM outputs are deterministically verified, not just "hallucination-prone guesses."


โญ Star us if you believe AI needs verification

GitHub Stars



Ready to trust your AI?

"Safe AI is the only AI that scales."


Contribute ยท Architecture ยท Security ยท Documentation