Oskar van der Wal
Oskar van der Wal
@aneveol, `answer_choices[stereo_antistereo]` wouldn't work as the target, since `stereo_antistereo` is not an integer label of which choice is the most stereotypical answer, but a string indicating whether it is about...
@jzf2101 Sorry, I don't have access to this repository (yet). My PR for CrowS-Pairs can be found [here](https://github.com/bigscience-workshop/promptsource/pull/748). This PR is about the bias-shades dataset, but I agree we need...
@jzf2101 I have cleaned this PR and fixed the prompt answer choices. Let me know if there is anything else required for this PR!
@ArjunSubramonian, apparently, the prompt [convert_to_stereotype](https://github.com/bigscience-workshop/promptsource/blob/e3a22e09d0131a6ca6810ad8684c59eab3ede13d/promptsource/templates/BigScienceBiasEval/bias-shades/spanish/templates.yaml#L48) is raising some issues in [lm-eval-harness](https://github.com/bigscience-workshop/lm-evaluation-harness/pull/37#pullrequestreview-1171340441). Could you have a look at it?
@jzf2101, I have created a new PR for adding French and English prompts without any blocking merge conflicts here: https://github.com/bigscience-workshop/promptsource/pull/837 This PR can be closed.
BBQ required me to implement custom metrics. Interestingly, everything works when running each subset of BBQ individually, but I run into a problem when running instead the `bbq` group: `TypeError:...
For now, I decided against including StereoSet.
@StellaAthena - Winogender does currently not replicate the results reported in the LLaMA v1 paper. @lintangsutawika suggested trying different prompts to see if one variation does agree. I have no...
@StellaAthena re: mentioning BLOOM using BBQ, you're right in that BLOOM didn't explicitly evaluate on BBQ. They did however evaluate on HELM and report their results on the bias category...
@justinphan3110cais, mainly because of time constraints on my end. And since I am mainly interested in evaluating autoregressive models, the implementation of StereoSet would also become more complicated. (It was...