evaluate Move `kwargs` from `compute` to a config pass during `load`

Currently, there is a mix between configs that are used when the metric is instantiated and configs that are later passed to compute. This makes life harder in several ways as pointed out in #137 and #138 and also makes the combine function from #150 harder to use. The two main points:

It is hard to know in advance what configs are available (e.g. useful for eval on the hub)
It is hard to configure metrics in advance that are used in a wrapped object. One always needs to pass the kwargs downstream (affects evaluator and combine).

I think we could solve this by moving all kwargs to the load. If a user wants to run a metric with different configs they can just load several instances of the metric which is cheap. In addition we could wrap the configs in something like a dataclass that specifies the types of configs and options when only a limited number of options are available (e.g. F1 can only be use binary or multilabel).

What do you think? @lewtun @lhoestq @sashavor

Jun 29 '22 16:06 lvwerra

I like this proposal to move all kwargs to the load() function a lot!

To help gauge the difference, could you share some pseudocode that demonstrates what the new API would look like for a "simple" (few kwargs) vs "complex" metric (mix of configs and kwargs)?

Jun 30 '22 07:06 lewtun

Is this actually solved since it is reverted in https://github.com/huggingface/evaluate/pull/299 ?

Oct 30 '22 13:10 hfawaz

I'm also confused as to where this is at? I'm not even sure if this is even where it is being tracked.

May 18 '23 16:05 christian-storm