ProgramFC icon indicating copy to clipboard operation
ProgramFC copied to clipboard

Results on GPT-4 are lower than the reuslts presented in the paper?

Open hustcxx opened this issue 2 years ago • 0 comments

Great jobs. I have some questions for the authors.

  1. I run the code on the GPT-4 with the same parameter settings, but the results (macro-F1) for using GPT-4 as the program generator (N=1, gold), but the results on FEVEROUS are lower than the results using text-davinci-003 presented in the github . FEVEROUS with GPT4: 91.05 FEVEROUS with text-davinci-003: 92.32 (presented in the github) This result is very confusing.
  2. I would like to know if the results reported in the paper as well as github are in the full dataset or the partially sampled dataset?

hustcxx avatar Nov 13 '23 08:11 hustcxx