autopilot
autopilot copied to clipboard
Add Benchmarks to test autopilot quality
Why We need ways to test if autopilot is getting better when we do changes
What Implement some test case tasks and have GPT review the output.