Harsh Raj

Results 4 issues of Harsh Raj

Hey, thank you for your great work. I just wanted to know how can I run evaluation on the open source AgentInstruct data on the AgentBench repo. I will be...

## Overview This pull request introduces a new test, **`deception_adherence`**, designed to evaluate the robustness of LLMs against deceptive instructions. Specifically, this test assesses how well LLMs resist following instructions...

Fix incorrect configuration variable in `garak/resources/tap/tap_main.py` ### Fix This PR addresses a bug in garak/resources/tap/tap_main.py where the incorrect variable name was used for the evaluator model configuration. Specifically, the line:...

Adds MCP server support for `codex`, `claude-code`, `gemini-cli`, and `grok-cli`. Also adds the appworld-mcp adapter. Merging is blocked until the below TODOs are complete: - Add oracle solutions for the...