evaluation
evaluation copied to clipboard
Add HuffPo Text Classification to Full Benchmark
use to test generalization to unseen labels; maybe use FLEX?
I will do this
I would like to help in this.
Here: https://github.com/bigscience-workshop/promptsource/pull/750