stagehand
stagehand copied to clipboard
Eval CI should fail if it detects a regression
Our github actions eval CI should not be green if the change introduces a regression to the evals.