human-eval
human-eval copied to clipboard
Code for the paper "Evaluating Large Language Models Trained on Code"
When I run "evaluate_functional_correctness sample.jsonl --problem_file=problem.jsonl", it has the following problem. Can u help me? thx Detail log. ------------------------------------------------- Reading samples... 1it [00:00, 2118.34it/s] Running test suites... 0%| | 0/1...
Removed deprecated fields. _Originally posted by @frapierri in https://github.com/facebookresearch/Ad-Library-API-Script-Repository/pull/19_
I created a conda environment with python3.7 using the exact same command in the doc. Then, I used openai's text-davinci-002 to generate a samples.jsonl file with 3 results for each...
In the prompt, it is stated that "Knowing that (a) is less than 100." Why are there test cases like assert candidate(11 * 13 * 7) == True in the...
I have make the evaluating program running successfully. But sometimes the error like: “No module named XXX” occurred. I want to know which python libraries can be called when the...
HE vAL
$ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl Reading samples... 6it [00:00, 7047.28it/s] Running test suites... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00
In evaluation the code uses ThreadPoolExecutor at first and in each thread use multiprocessing package. Why not use ProcessPoolExecutor at first? Is there any consideration of optimizing performance?
Dear HumanEval Maintainers, Thank you so much for sharing this awesome Test Set! I fully understand that due to the nature of a Test Set, we want to keep it...
https://github.com/openai/human-eval/blob/312c5e5532f0e0470bf47f77a6243e02a61da530/human_eval/evaluation.py#L26 This code returns 1 when c=0 and n < k, whereas 0 is expected.