Joel Niklaus
Joel Niklaus
Hi guys, I like your huggingface models a lot! Thank you very much for that! I saw that you uploaded many models there, but unfortunately there is no model for...
In Germany https://dejure.org/ could be added
Thinking models like DeepSeek-R1 emit thinking tags in the output. Is there a way to filter these out easily? Currently they make it directly into the output and so mess...
Adds new community tasks with swiss legal evaluations. Currently translation tasks are supported but others may follow in the future.
This is a first try for issue #496. However, we need the docs which currently in turn depend on the model being initialized. The model itself is not actually needed,...
## Issue encountered Evaluating large models (> 30B parameters) is hard, especially with limited hardware. Especially when there are many metrics to be evaluated, it can significantly increase the time...
Some models are very expensive to run inference on (e.g., Llama-3.3-70B). When we need to rerun inference to add a new metric for example, it would be very time consuming...
## Issue encountered The JudgeLLM class currently does not support the litellm backend, prohibiting judges such as Claude Sonnet. ## Solution/Feature Add support for litellm backend in the JudgeLLM.
## Issue encountered My models currently don't follow the template I give. I want to give a system prompt that nudges the models to provide output the way I want...