InfiniteBench icon indicating copy to clipboard operation
InfiniteBench copied to clipboard

Mismatch for longbook_qa_eng

Open xuandif-cmu opened this issue 1 year ago • 1 comments

Are the GPT4 results evaluated on a different set of longbook_qa_eng? The 'ground_truth' fields in results/gpt4/preds_longbook_qa_eng.jsonl don't seem match with ground_truth in results/chatglm3/preds_longbook_qa_eng.jsonl

xuandif-cmu avatar Aug 20 '24 00:08 xuandif-cmu

We have revised the En.QA task. And those two models are evaluated at different task versions

tuantuanzhang avatar Aug 21 '24 08:08 tuantuanzhang