evaluate icon indicating copy to clipboard operation
evaluate copied to clipboard

Token classification bootstrap crashing with custom dataset

Open darebfh opened this issue 2 years ago • 0 comments

OS: maxOS 14.1 Python: 3.11.6 PyTorch: 2.0.1

Description: Standard evaluation of custom dataset works:

dataset = Dataset.from_list(dictlist)

task_evaluator = evaluator("token-classification")

eval_results = task_evaluator.compute(
    model_or_pipeline=<model_path>,
    data=<custom_dataset["validation"]>,
    metric="seqeval",
    label_column="tags"
)

However, when adding bootstrapping, I get a crash:

eval_results = task_evaluator.compute(
    model_or_pipeline=<model_path>,
    data=<custom_dataset["validation"]>,
    metric="seqeval",
    label_column="tags",
    strategy="bootstrap",
    n_resamples=30,

Potential solution: Add parameter for "zero_division" as explained in warning below.

Stacktrace: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use zero_division parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Traceback (most recent call last): , line 32, in eval_results = task_evaluator.compute( ^^^^^^^^^^^^^^^^^^^^^^^ line 266, in compute metric_results = self.compute_metric( ^^^^^^^^^^^^^^^^^^^^ line 531, in compute_metric bootstrap_dict = self._compute_confidence_interval( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ line 147, in _compute_confidence_interval bs = bootstrap( ^^^^^^^^^^ line 450, in bootstrap args = _bootstrap_iv(data, statistic, vectorized, paired, axis, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ line 155, in _bootstrap_iv sample = np.atleast_1d(sample) ^^^^^^^^^^^^^^^^^^^^^ line 65, in atleast_1d ary = asanyarray(ary) ^^^^^^^^^^^^^^^ ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (416,) + inhomogeneous part.

darebfh avatar Feb 01 '24 15:02 darebfh