evaluate
evaluate copied to clipboard
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
This is a proposed refactor to the `perplexity` metric which would bring `perplexity` closer to the other metrics in `evaluate`, which generally do not run inference in their `compute` functions,...
Merging with the open docs PR for perplexity, #238. Closes #241.
Previously, `evaluator.compute(..., data='imdb', ....)` would fail because it was returning an object of type `dataset.DatasetDict`. This automatically detects a split if none is given (i.e. user passes in the dataset...
NIST is a somewhat older but well known metric for MT that is similar to BLEU. I'd like to add it to the base arsenal of `evaluate`. Core work is...
Small but important bug in the setup file. There was no comma between `cookiecutter` and `gradio` so pip tried to resolve it as `cookiecuttergradio>=3.0.0`. closes https://github.com/huggingface/evaluate/issues/249
When installing with the optional `[template`] leads to a bug: > ERROR: Could not find a version that satisfies the requirement cookiecuttergradio>=3.0.0 (from evaluate[template]) (from versions: none) > ERROR: No...
Currently, when you give a different number of references and hypotheses to chrf, you get this error: > Sacrebleu requires the same number of references for each prediction When you...
Currently there are several different inputs/output formats possible in `Metrics`. We should standardize them as much as possible and respecting the following principle: - inputs/outputs are easy to understand and...
When I was using "import evaluate" library in Kaggle notebook. The import automatically fulled 15,4 GB of GPU memory, and I could not further train the model due to lack...