lighteval
lighteval copied to clipboard
[EVAL] TauBench:
Evaluation short description
A Benchmark for Tool-Agent-User Interaction in Real-World Domains.
Evaluation metadata
Provide all available
- Paper url:
- Github URL: https://github.com/sierra-research/tau-bench
- Dataset url: