evoeval icon indicating copy to clipboard operation
evoeval copied to clipboard

EvoEval: Evolving Coding Benchmarks via LLM

Results 2 evoeval issues
Sort by recently updated
recently updated
newest added

Hi -- very nice eval. I'm looking at the difficult subset and it seems like there are a number of problems that are incorrectly specified or have bugs in the...

Got this idea from EvalPlus and BigCodeBench, and that sometimes it would be good to do apples-to-apples between models, and that if most of the top models are large or...