goose icon indicating copy to clipboard operation
goose copied to clipboard

Initial set of evaluations for GooseBench

Open zakiali opened this issue 11 months ago • 1 comments

This PR builds on top of https://github.com/block/goose/pull/1307, which adds the goose-bench crate and framework for defining and running evals.

  1. Updates to the Evaluation trait to have a description associated with the metric
  2. Reporting for displaying reports and updates to the goose bench command
  3. Evals for extension tools (developer, computercontroller, and memory) along with search/replace checks on long files.

zakiali avatar Feb 20 '25 20:02 zakiali

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.