Cannot use evaluate.TextClassificationEvaluator for multi-label classification

Open lorenzobalzani opened this issue 2 years ago • 0 comments

System Info

transformers version: 4.35.2
Platform: Linux-5.15.120+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.1
Accelerate version: 0.25.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu118 (True)
Tensorflow version (GPU?): 2.14.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.5 (gpu)
Jax version: 0.4.20
JaxLib version: 0.4.20
Using GPU in script? YES
Using distributed or parallel set-up in script? NO

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I am working on emotion classification and I am willing to evaluate the model with this evaluator.

Running this example, I encounter the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-23-61f332f138dc>](https://localhost:8080/#) in <cell line: 3>()
      1 task_evaluator = evaluate.evaluator("sentiment-analysis")
      2 
----> 3 task_evaluator.compute(
      4     model_or_pipeline=model_hub_path,
      5     data=dataset,

7 frames
[/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py](https://localhost:8080/#) in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
   1389             if y_type == "multiclass":
   1390                 average_options.remove("samples")
-> 1391             raise ValueError(
   1392                 "Target is %s but average='binary'. Please "
   1393                 "choose another average setting, one of %r." % (y_type, average_options)

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

Expected behavior

I should set an average argument when invoking task_evaluator.compute. Without that, I wouldn't be able to use the evaluator for multi-label classification tasks.

Dec 12 '23 08:12 lorenzobalzani