lighteval icon indicating copy to clipboard operation
lighteval copied to clipboard

An apparent bug in drop's dealing with multi-span answer

Open sadra-barikbin opened this issue 1 year ago • 2 comments

Hi there! 🤗

It seems that drop_metrics selects only the first span when an answer is of type multi-span:

https://github.com/huggingface/lighteval/blob/ad42e43bcc3bd50fdba68936999bf553bf53b9e4/src/lighteval/metrics/harness_compatibility/drop.py#L149-L153

Maybe we could remove the first if statement and change the second one accordingly to fix it.

sadra-barikbin avatar May 21 '24 11:05 sadra-barikbin

I think we had to use this system to be compatible with the results of the Eleuther AI Harness - could you check if they have updated the mechanism so we see if we update on our side too? :)

clefourrier avatar May 22 '24 13:05 clefourrier

It seems that it's correct there:

    for gold_answer in golds:
        exact_match, f1_score = get_metrics(preds, gold_answer)
        if gold_answer[0].strip():
            max_em = max(max_em, exact_match)
            max_f1 = max(max_f1, f1_score)
    return {"em": max_em, "f1": max_f1}

sadra-barikbin avatar May 22 '24 16:05 sadra-barikbin

Yes, you are right, I checked the downstream logic and it seems like they don't take the first item anywhere - I think it's something we do by assuming the length is always one, but we would have to check this and add an assert. If you want to do edits and open a PR, feel free to :)

clefourrier avatar May 23 '24 08:05 clefourrier