lighteval
lighteval copied to clipboard
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Hi there! 🤗 It seems that `drop_metrics` selects only the first span when an answer is of type multi-span: https://github.com/huggingface/lighteval/blob/ad42e43bcc3bd50fdba68936999bf553bf53b9e4/src/lighteval/metrics/harness_compatibility/drop.py#L149-L153 Maybe we could remove the first `if` statement and change...
Make it so both model types can run tensorboard evals. This should be good to go, but will need to wait for an update in `huggingface_hub` before merging, as the...
What does this PR do? == This PR just fixes an error caused in `self._init_max_length(config.max_length).` I added try-except to avoid the error. ## Error `AttributeError` occurred while processing `load_model()` for...
Hi there! `LightevalTask.process_results()` does not expect the same ordering of requests by type as what is implied by `LightevalTask.get_request_type()`: `create_requests_from_tasks` returns a dict of requests whose keys' order follows the...
Hi there! To fix a few typos and do a tiny refactor.
Tests for Nanotron can be conducted in both full and lite modes. Tasks can be defined by modifying the configuration file
Hi there. `apply_target_perplexity_metric` seems to pop only the first related response not all of them. https://github.com/huggingface/lighteval/blob/ad42e43bcc3bd50fdba68936999bf553bf53b9e4/src/lighteval/metrics/__init__.py#L31-L36 While its related responses are as much as the number of gold choices.
Hi there! - ~~To import openai in `llm_as_judge.py` lazily.~~ - To download BERT model in `bert_scorer.py` lazily.
Dear lighteval team, I had this version fsspec==2023.12.1, and I had issues once the evaluation is done is failing. I am using now fsspec==2024.3.1 and it works fine. Although I...