evaluate
evaluate copied to clipboard
Cannot use evaluate.TextClassificationEvaluator for multi-label classification
System Info
-
transformersversion: 4.35.2 - Platform: Linux-5.15.120+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu118 (True)
- Tensorflow version (GPU?): 2.14.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.5 (gpu)
- Jax version: 0.4.20
- JaxLib version: 0.4.20
- Using GPU in script? YES
- Using distributed or parallel set-up in script? NO
Who can help?
No response
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
I am working on emotion classification and I am willing to evaluate the model with this evaluator.
Running this example, I encounter the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-23-61f332f138dc>](https://localhost:8080/#) in <cell line: 3>()
1 task_evaluator = evaluate.evaluator("sentiment-analysis")
2
----> 3 task_evaluator.compute(
4 model_or_pipeline=model_hub_path,
5 data=dataset,
7 frames
[/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py](https://localhost:8080/#) in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
1389 if y_type == "multiclass":
1390 average_options.remove("samples")
-> 1391 raise ValueError(
1392 "Target is %s but average='binary'. Please "
1393 "choose another average setting, one of %r." % (y_type, average_options)
ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].
Expected behavior
I should set an average argument when invoking task_evaluator.compute. Without that, I wouldn't be able to use the evaluator for multi-label classification tasks.