Move `kwargs` from `compute` to a config pass during `load`
Currently, there is a mix between configs that are used when the metric is instantiated and configs that are later passed to compute. This makes life harder in several ways as pointed out in #137 and #138 and also makes the combine function from #150 harder to use. The two main points:
- It is hard to know in advance what configs are available (e.g. useful for eval on the hub)
- It is hard to configure metrics in advance that are used in a wrapped object. One always needs to pass the kwargs downstream (affects
evaluatorandcombine).
I think we could solve this by moving all kwargs to the load. If a user wants to run a metric with different configs they can just load several instances of the metric which is cheap. In addition we could wrap the configs in something like a dataclass that specifies the types of configs and options when only a limited number of options are available (e.g. F1 can only be use binary or multilabel).
What do you think? @lewtun @lhoestq @sashavor
I like this proposal to move all kwargs to the load() function a lot!
To help gauge the difference, could you share some pseudocode that demonstrates what the new API would look like for a "simple" (few kwargs) vs "complex" metric (mix of configs and kwargs)?
Is this actually solved since it is reverted in https://github.com/huggingface/evaluate/pull/299 ?
I'm also confused as to where this is at? I'm not even sure if this is even where it is being tracked.