evaluate 🌟 [Metric Request] WOOD score

WOOD score paper : https://arxiv.org/pdf/2007.06898.pdf

Abstract :

Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and ‘hack’ datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance – and thus overestimation in AI systems’ capabilities – we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.

Jul 15 '20 01:07 astariul

Is this being worked on? If not, I'd like to try! I can do this by following the directions outlined here, correct?

Aug 25 '22 14:08 kasmith11

Hi @kasmith11, I don't think anybody is working on it right now. Following the guide will create a community metric (i.e. one you can load with load("kasmith/wood"). But to make it an official metric maintained in evaluate we can simply move the code into metrics/ after, so it's a good start and you can test it without needing to merge a PR :)

Aug 26 '22 11:08 lvwerra

i would also like to work on this one. [new guy here]

Aug 26 '22 11:08 sezan92

I'm very open to collaboration! If you're interested, we can work together on this @sezan92. Would that change anything you outlined above @lvwerra?

Aug 26 '22 12:08 kasmith11

Sure, if you'd like to collaborate that would be a good issue :) For communication you could join our Discord: https://huggingface.co/join/discord

Aug 26 '22 12:08 lvwerra

@kasmith11 sorry for late reply. sure. how would you like to begin ?

Aug 29 '22 09:08 sezan92

Hi @sezan92, I took an initial pass at implementing WOOD score here after reading the paper. I haven't gotten a chance to test the implementation or fill out any of the documentation.

I think testing/debugging and documentation are the next steps.

Are you in the huggingface discord linked above? I think that would be a great place for us to communicate via chat going forward.

Aug 29 '22 13:08 kasmith11

@kasmith11 yes i just joined. my username is sezan92

Aug 29 '22 13:08 sezan92

Fantastic @sezan92. I'll reach out to you via discord soon.

Aug 29 '22 15:08 kasmith11

I have a repository with an implementation of WoodScore here. I've had more time to dedicate to this if you are interested still @sezan92

Mar 28 '23 19:03 kasmith11