addition of Safearena Benchmark

Open alzambranolu13 opened this issue 8 months ago • 0 comments

ToDo:

[ ] Update ReadMe
[ ] Add tests similar to VisualWebArena
[ ] Refactor similar to VisualWebArena (remove any duplication)

Description by Korbit AI

What change is being made?

Add the SafeArena Benchmark to the BrowserGym project by introducing configurations, tasks, metadata, and scripts related to "safearena", updating the Makefile to include safearena, and managing dependencies in the pyproject.toml and requirements.txt files.

Why are these changes being made?

This integration allows the BrowserGym project to support and benchmark web agents on safety-focused tasks, thereby enhancing the scope of the platform in testing agent capability against safely navigating web environments. This addition caters to the ongoing research in ensuring AI agents operate safely in applications involving internet navigation.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

May 27 '25 23:05 alzambranolu13