eval-dev-quality
eval-dev-quality copied to clipboard
Sandbox execution
We need a common helper to sandbox all the executions we are doing. Right now, an LLM could generate a remove-all-your-files call, and we just execute it.