evals
evals copied to clipboard
Is there a list of tasks that chatgpt fails?
Hi, is it possible to see a collection of questions that gpt-4 is failing? I wan't to test some prompts I wrote that improve accuracy, and I thought it would be good to focus on tasks that it is currently failing. Thanks!
try the logic evals https://github.com/openai/evals/tree/main/evals/registry/data/logic
They fail even with cot of reasoning