Benchmark framework to compare agentic AI with different prompt methods and no human in the loop

Open cristobalvch opened this issue 5 months ago • 2 comments

I added the suggested requirements. I'm awaiting for any other comment! sorry for the delay :/

Sep 03 '25 10:09 cristobalvch

On the naming, promptbench already exists: https://github.com/microsoft/promptbench

@cristobalvch shall we maybe consider something different, that is novel and that somehow hints about the security-connection (e.g. Prompt2PwnBench)?

Sep 04 '25 08:09 vmayoral

yeah sure!! i like the name. I will change it noww

Sep 04 '25 09:09 cristobalvch

@vmayoral any more suggestions? Nice contrib!

Sep 09 '25 12:09 SoyGema

Hello @SoyGema, we're already in touch with @cristobalvch and this has been digested. The team's meeting him in a call and @Mery-Sanz will work with @cristobalvch for the integration.

As a side note, this contribution is indeed fantastic and grants further collaboration. We're considering to fund @cristobalvch part time position as a result to continue the research here and in other related avenues.

Sep 10 '25 12:09 vmayoral