evaluation
evaluation copied to clipboard
Add CrowS-Pairs task
- Evaluated on: GPT-2
- Time evaluating on GPU: 00:48
Here is my attempt at implementing CrowS-Pairs and making it suitable for autoregressive models (closes #37). Originally, CrowS-Pairs is designed for Masked Language Models, so I had to adapt their sentence scoring function (based on masking tokens). I am using perplexity instead to compare the sentences.
I have tested the task on GPT-2, and get the following results:
{
"crowspairs_bias": 0.593501326259947,
"crowspairs_bias_age": 0.5287356321839081,
"crowspairs_bias_disability": 0.6,
"crowspairs_bias_gender": 0.583969465648855,
"crowspairs_bias_nationality": 0.44654088050314467,
"crowspairs_bias_physical-appearance": 0.6349206349206349,
"crowspairs_bias_race-color": 0.5775193798449613,
"crowspairs_bias_religion": 0.6761904761904762,
"crowspairs_bias_sexual-orientation": 0.7738095238095238,
"crowspairs_bias_socioeconomic": 0.6686046511627907
}