Oskar van der Wal

Results 3 issues of Oskar van der Wal

This PR implements various popular benchmarks for evaluating LMs for social biases. I also aim to have these validated where possible: e.g., by comparing with existing implementations or results, or...

- Evaluated on: GPT-2 - Time evaluating on GPU: 00:48 Here is my attempt at implementing CrowS-Pairs and making it suitable for autoregressive models (closes #37). Originally, CrowS-Pairs is designed...

- `extract_metrics.py`: collect the parameter statistics for inducing the HMM training maps. - `training_map.py`: find best training maps + visualize Markov chains.