Nathan Habib

Results 29 issues of Nathan Habib

- removes uneeded script files to launch lighteval - moves parsers function to `lighteval.commands.parsers` - creates a lighteval cli executable with 3 subcommands (list-tasks, nanotron, accelerate) how to use: ###...

It is currently cumbersome to log details of what is happening in metric functions, log judge prompt in llm as judge metric for example. Passing it the evaluation tracker would...

feature request

What this PR does - Adds a security when computing batch size so that models do not OOM - Recomputes batch size for each greedy until tasks

What this PR does - Adds a try catch when applying chat template as some models do not support system prompt in their templates.

adds support for both model and data parallelism

## Issue encountered Lighteval does not allow evaluating models on tool usage. ## Solution/Feature Add benchmarks for tool usage

feature request

https://github.com/NVIDIA/RULER Available context size: 4096, 8192, 16384, 32768, 65536, 131072 ``` uv run lighteval vllm "model_name=meta-llama/Llama-3.1-8B,dtype=bfloat16,max_model_length=131072" "lighteval|ruler_{context size}|0|0" ```

new-task

## Evaluation short description A Benchmark for Tool-Agent-User Interaction in Real-World Domains. ## Evaluation metadata Provide all available - Paper url: - Github URL: https://github.com/sierra-research/tau-bench - Dataset url:

help wanted
new-task