evaluate issues

Refactor perplexity so compute() does not run inference by default

2

This is a proposed refactor to the `perplexity` metric which would bring `perplexity` closer to the other metrics in `evaluate`, which generally do not run inference in their `compute` functions,...

mathemakitten

Change perplexity to be calculated with base e

2

Merging with the open docs PR for perplexity, #238. Closes #241.

mathemakitten

Add bits-per-word metric

mathemakitten

metric request

Automatically choose dataset split if none provided

2

Previously, `evaluator.compute(..., data='imdb', ....)` would fail because it was returning an object of type `dataset.DatasetDict`. This automatically detects a split if none is given (i.e. user passes in the dataset...

mathemakitten

Add NIST metric

6

NIST is a somewhat older but well known metric for MT that is similar to BLEU. I'd like to add it to the base arsenal of `evaluate`. Core work is...

BramVanroy

fix bug in TEMPLATE_REQUIRE: add comma

1

Small but important bug in the setup file. There was no comma between `cookiecutter` and `gradio` so pip tried to resolve it as `cookiecuttergradio>=3.0.0`. closes https://github.com/huggingface/evaluate/issues/249

BramVanroy

Installation problem with [template]

When installing with the optional `[template`] leads to a bug: > ERROR: Could not find a version that satisfies the requirement cookiecuttergradio>=3.0.0 (from evaluate[template]) (from versions: none) > ERROR: No...

BramVanroy

Clarify error message for ChrF no. references

1

Currently, when you give a different number of references and hypotheses to chrf, you get this error: > Sacrebleu requires the same number of references for each prediction When you...

BramVanroy

Feature: standardize inputs/outputs of metrics

28

Currently there are several different inputs/output formats possible in `Metrics`. We should standardize them as much as possible and respecting the following principle: - inputs/outputs are easy to understand and...

lvwerra

enhancement

Import issue

5

When I was using "import evaluate" library in Kaggle notebook. The import automatically fulled 15,4 GB of GPU memory, and I could not further train the model due to lack...

LecbychMichal

evaluate
evaluate copied to clipboard

Metadata

Refactor perplexity so compute() does not run inference by default

Change perplexity to be calculated with base e

Add bits-per-word metric

Automatically choose dataset split if none provided

Add NIST metric

fix bug in TEMPLATE_REQUIRE: add comma

Installation problem with [template]

Clarify error message for ChrF no. references

Feature: standardize inputs/outputs of metrics

Import issue

← Metadata

Owner

Metadata

evaluate evaluate copied to clipboard

Metadata

← Metadata

Owner

Metadata

evaluate
evaluate copied to clipboard