evaluate icon indicating copy to clipboard operation
evaluate copied to clipboard

🌟 [Metric Request] WOOD score

Open astariul opened this issue 5 years ago β€’ 9 comments

WOOD score paper : https://arxiv.org/pdf/2007.06898.pdf

Abstract :

Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and β€˜hack’ datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance – and thus overestimation in AI systems’ capabilities – we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.

astariul avatar Jul 15 '20 01:07 astariul

Is this being worked on? If not, I'd like to try! I can do this by following the directions outlined here, correct?

kasmith11 avatar Aug 25 '22 14:08 kasmith11

Hi @kasmith11, I don't think anybody is working on it right now. Following the guide will create a community metric (i.e. one you can load with load("kasmith/wood"). But to make it an official metric maintained in evaluate we can simply move the code into metrics/ after, so it's a good start and you can test it without needing to merge a PR :)

lvwerra avatar Aug 26 '22 11:08 lvwerra

i would also like to work on this one. [new guy here]

sezan92 avatar Aug 26 '22 11:08 sezan92

I'm very open to collaboration! If you're interested, we can work together on this @sezan92. Would that change anything you outlined above @lvwerra?

kasmith11 avatar Aug 26 '22 12:08 kasmith11

Sure, if you'd like to collaborate that would be a good issue :) For communication you could join our Discord: https://huggingface.co/join/discord

lvwerra avatar Aug 26 '22 12:08 lvwerra

@kasmith11 sorry for late reply. sure. how would you like to begin ?

sezan92 avatar Aug 29 '22 09:08 sezan92

Hi @sezan92, I took an initial pass at implementing WOOD score here after reading the paper. I haven't gotten a chance to test the implementation or fill out any of the documentation.

I think testing/debugging and documentation are the next steps.

Are you in the huggingface discord linked above? I think that would be a great place for us to communicate via chat going forward.

kasmith11 avatar Aug 29 '22 13:08 kasmith11

@kasmith11 yes i just joined. my username is sezan92

sezan92 avatar Aug 29 '22 13:08 sezan92

Fantastic @sezan92. I'll reach out to you via discord soon.

kasmith11 avatar Aug 29 '22 15:08 kasmith11

I have a repository with an implementation of WoodScore here. I've had more time to dedicate to this if you are interested still @sezan92

kasmith11 avatar Mar 28 '23 19:03 kasmith11