evoeval
evoeval copied to clipboard
EvoEval: Evolving Coding Benchmarks via LLM
Results
2
evoeval issues
Sort by
recently updated
recently updated
newest added
Hi -- very nice eval. I'm looking at the difficult subset and it seems like there are a number of problems that are incorrectly specified or have bugs in the...
Got this idea from EvalPlus and BigCodeBench, and that sometimes it would be good to do apples-to-apples between models, and that if most of the top models are large or...