apps issues

evaluation on multiple solutions at once causes memory leak

13

Hi @xksteven , I have a question about why you advise to run the evaluation code for one solution at a time instead of doing it for all generations at...

loubnabnl

Nan test case average

3

Hello, I am trying to evaluate my model's generated codes using scripts in eval. However, for a particular problem, results[index] turns out to be an empty array as a result...

sindhura97

check5 in function "run_test" seem to bring some wrong result

2

Hi, thanks for your work. I don't quite understand the role of check5 in the evaluating process, it seems to bring some wrong results. Here is an example of 4496...

Zyq-scut

Steps About Generated Code Solutions Post-processing

1

Hi，thanks for the amazing work! I wank to ask about the detailed steps about generated code solutions post-processing when testing one solution. (e.g. After a code solution was generated, did...

MrBlack0220

Unable to run pre-trained (1.5B) model on test set

1

I'm trying to run the pre-trained 1.5B model linked in [the README](https://drive.google.com/file/d/1XW1Od9L-5l9zXl1HUCyER5pS9zQTbIvU/view?usp=sharing) on the APPS test set. I downloaded the dataset and ran the script `train/apps_create_split.py` on it, then ran...

jack-jjm

Too Long Problems

8

![1668943017588](https://user-images.githubusercontent.com/65756145/202899082-36c9d520-a320-4a51-bcf3-777fa25c5ef6.png) There are some long problems in APPS, so I truncated them after encoding. But the output of the model is in the form of "problem + answer", so output...

MT010104

answer_type calculation is different for train/val and eval

1

Not necessarily an issue, but I noticed that for train/val, the answer_type is based on whether starter_code exists but that at eval time, it's based on fn_name. Is there a...

minimario

TomLucidor

apps
apps copied to clipboard

Metadata

evaluation on multiple solutions at once causes memory leak

Nan test case average

check5 in function "run_test" seem to bring some wrong result

Steps About Generated Code Solutions Post-processing

Unable to run pre-trained (1.5B) model on test set

Too Long Problems

answer_type calculation is different for train/val and eval

Problem in ground-truth solutions

fix pyext for python 3.11

Leaderboard?

← Metadata

Owner

Metadata

apps apps copied to clipboard

Metadata

← Metadata

Owner

Metadata

apps
apps copied to clipboard