lighteval issues

An apparent bug in drop's dealing with multi-span answer

2

Hi there! 🤗 It seems that `drop_metrics` selects only the first span when an answer is of type multi-span: https://github.com/huggingface/lighteval/blob/ad42e43bcc3bd50fdba68936999bf553bf53b9e4/src/lighteval/metrics/harness_compatibility/drop.py#L149-L153 Maybe we could remove the first `if` statement and change...

sadra-barikbin

Homogeneize logging system

14

Make it so both model types can run tensorboard evals. This should be good to go, but will need to wait for an update in `huggingface_hub` before merging, as the...

clefourrier

Fix _init_max_length in base_model.py

What does this PR do? == This PR just fixes an error caused in `self._init_max_length(config.max_length).` I added try-except to avoid the error. ## Error `AttributeError` occurred while processing `load_model()` for...

gucci-j

`LightevalTask.process_results()` is not aligned with `LightevalTask.get_request_type()`

1

Hi there! `LightevalTask.process_results()` does not expect the same ordering of requests by type as what is implied by `LightevalTask.get_request_type()`: `create_requests_from_tasks` returns a dict of requests whose keys' order follows the...

sadra-barikbin

Fix a few typos and do a tiny refactor

3

Hi there! To fix a few typos and do a tiny refactor.

sadra-barikbin

Nanotron tests

5

Tests for Nanotron can be conducted in both full and lite modes. Tasks can be defined by modifying the configuration file

zzhhjjj

fix llm as judge warnings

2

NathanHB

`apply_target_perplexity_metric` pops only the first response

Hi there. `apply_target_perplexity_metric` seems to pop only the first related response not all of them. https://github.com/huggingface/lighteval/blob/ad42e43bcc3bd50fdba68936999bf553bf53b9e4/src/lighteval/metrics/__init__.py#L31-L36 While its related responses are as much as the number of gold choices.

sadra-barikbin

Download BERT scorer lazily

1

Hi there! - ~~To import openai in `llm_as_judge.py` lazily.~~ - To download BERT model in `bert_scorer.py` lazily.

sadra-barikbin

Issue when saving results with fsspec==2023.12.1

Dear lighteval team, I had this version fsspec==2023.12.1, and I had issues once the evaluation is done is failing. I am using now fsspec==2024.3.1 and it works fine. Although I...

Hamza-Alobeidli

lighteval
lighteval copied to clipboard

Metadata

An apparent bug in drop's dealing with multi-span answer

Homogeneize logging system

Fix _init_max_length in base_model.py

`LightevalTask.process_results()` is not aligned with `LightevalTask.get_request_type()`

Fix a few typos and do a tiny refactor

Nanotron tests

fix llm as judge warnings

`apply_target_perplexity_metric` pops only the first response

Download BERT scorer lazily

Issue when saving results with fsspec==2023.12.1

← Metadata

Owner

Metadata

lighteval lighteval copied to clipboard

Metadata

← Metadata

Owner

Metadata

lighteval
lighteval copied to clipboard