evaluation
evaluation copied to clipboard
Add BLiMP task
One thing I was unsure about is how to split up model performance on individual subtasks: within BLiMP it would be a bit odd to just merge all accuracies together into a single number, but I can imagine that given the scale of different datasets that are considered we don't necessarily want to split up tasks into subtasks as well.
However, if we would want that split to be present as well I can easily add it. Can the self.metrics dictionary contain any kind of entry?