cai icon indicating copy to clipboard operation
cai copied to clipboard

Benchmark framework to compare agentic AI with different prompt methods and no human in the loop

Open cristobalvch opened this issue 5 months ago • 2 comments

I added the suggested requirements. I'm awaiting for any other comment! sorry for the delay :/

cristobalvch avatar Sep 03 '25 10:09 cristobalvch

On the naming, promptbench already exists: https://github.com/microsoft/promptbench

@cristobalvch shall we maybe consider something different, that is novel and that somehow hints about the security-connection (e.g. Prompt2PwnBench)?

vmayoral avatar Sep 04 '25 08:09 vmayoral

yeah sure!! i like the name. I will change it noww

cristobalvch avatar Sep 04 '25 09:09 cristobalvch

@vmayoral any more suggestions? Nice contrib!

SoyGema avatar Sep 09 '25 12:09 SoyGema

Hello @SoyGema, we're already in touch with @cristobalvch and this has been digested. The team's meeting him in a call and @Mery-Sanz will work with @cristobalvch for the integration.

As a side note, this contribution is indeed fantastic and grants further collaboration. We're considering to fund @cristobalvch part time position as a result to continue the research here and in other related avenues.

vmayoral avatar Sep 10 '25 12:09 vmayoral