lighteval icon indicating copy to clipboard operation
lighteval copied to clipboard

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Results 276 lighteval issues
Sort by recently updated
recently updated
newest added

Hi there! 🤗 It seems that `drop_metrics` selects only the first span when an answer is of type multi-span: https://github.com/huggingface/lighteval/blob/ad42e43bcc3bd50fdba68936999bf553bf53b9e4/src/lighteval/metrics/harness_compatibility/drop.py#L149-L153 Maybe we could remove the first `if` statement and change...

Make it so both model types can run tensorboard evals. This should be good to go, but will need to wait for an update in `huggingface_hub` before merging, as the...

What does this PR do? == This PR just fixes an error caused in `self._init_max_length(config.max_length).` I added try-except to avoid the error. ## Error `AttributeError` occurred while processing `load_model()` for...

Hi there! `LightevalTask.process_results()` does not expect the same ordering of requests by type as what is implied by `LightevalTask.get_request_type()`: `create_requests_from_tasks` returns a dict of requests whose keys' order follows the...

Hi there! To fix a few typos and do a tiny refactor.

Tests for Nanotron can be conducted in both full and lite modes. Tasks can be defined by modifying the configuration file

Hi there. `apply_target_perplexity_metric` seems to pop only the first related response not all of them. https://github.com/huggingface/lighteval/blob/ad42e43bcc3bd50fdba68936999bf553bf53b9e4/src/lighteval/metrics/__init__.py#L31-L36 While its related responses are as much as the number of gold choices.

Hi there! - ~~To import openai in `llm_as_judge.py` lazily.~~ - To download BERT model in `bert_scorer.py` lazily.

Dear lighteval team, I had this version fsspec==2023.12.1, and I had issues once the evaluation is done is failing. I am using now fsspec==2024.3.1 and it works fine. Although I...