Mo
Mo
I will try to look into that..
Are these the ones shared on www.swebench.com or different?
"This repository contains the predictions, execution logs, trajectories, and results for model inference + evaluation runs on the [SWE-bench](https://swe-bench.github.io/) task." https://github.com/swe-bench/experiments
Thanks for the tip @joshkyh
Maybe a fixed web link of the benchmark image so that it updates everywhere whenever there is a new agent.
> multiple tries, Range?
> In our experiments, the best models can achieve rather high scores when given multiple tries, and I believe that the practical upper bound is much higher than the current...
> I do not think this is possible anymore. And that is a good thing indeed! (unless you are on openBSD or something of that sort)
Any updates on MCP?
@qcgm1978 looks amazing! can't wait to try it myself. Thank you for the hardwork.