Benchmark framework to compare agentic AI with different prompt methods and no human in the loop
I added the suggested requirements. I'm awaiting for any other comment! sorry for the delay :/
On the naming, promptbench already exists: https://github.com/microsoft/promptbench
@cristobalvch shall we maybe consider something different, that is novel and that somehow hints about the security-connection (e.g. Prompt2PwnBench)?
yeah sure!! i like the name. I will change it noww
@vmayoral any more suggestions? Nice contrib!
Hello @SoyGema, we're already in touch with @cristobalvch and this has been digested. The team's meeting him in a call and @Mery-Sanz will work with @cristobalvch for the integration.
As a side note, this contribution is indeed fantastic and grants further collaboration. We're considering to fund @cristobalvch part time position as a result to continue the research here and in other related avenues.