tju01

Results 14 issues of tju01

### Feature request Please add the option to provide the prompt as a list of token ids instead of a string in the `text_generation.Client.generate` method. ### Motivation The official implementation...

CoT currently consists of GSM8K, MATH, BBH & MMLU. The first three things are fine and it has been shown that CoT gives significant improvements for these benchmarks. However, the...

bug
existing-benchmark

Make it possible for the user to specify the prompt template through cmdline models args without needing to add code

enhancement

Use the resulting JSON files in the `reports/` folder to generate some nice looking plots like in https://www.xlang.ai/blog/openlemur

enhancement

- Currently, it writes all results to stdout. It should instead only print the requested results. - More details would be nice. Currently we only print a very coarse summary...

enhancement

enhancement
existing-benchmark

Something that can measure how well a LLM can deal with tools. CoT already kind of goes that way, but not really since it's limited to a bit of mathematical...

enhancement
new-benchmark