simple-evals
simple-evals copied to clipboard
Question: Why is Claude running with temperature 0 and GPT4 with temperature 0.5? wouldn't that be a handicap for Claude in Humaneval?
I'm also confused by this. Why not use temperature = 1 for both models, since that's the default temperature on the APIs?
Could you link to the "Anthropic example" you have in mind in this comment?
temperature: float = 0.0, # default in Anthropic example
Some additional references:
- In the Claude 2 model card:
In all cases where evaluations involve free-form sampling, we evaluate our models at temperature T = 1 to represent normal usage, unless indicated otherwise.
- In the Claude 3 model card, referring to multimodal evaluations:
All the results in Table 3 have been obtained by sampling at temperature T = 0.
- In the Claude 3 model card, referring to the QuALITY benchmark:
We test both Claude 3 and Claude 2 model families in 0-shot and 1-shot settings, sampled with temperature T = 1.