tju01 issues

Results 14 issues of


                                            tju01

Allow client to provide prompt token ids instead of a string

### Feature request Please add the option to provide the prompt as a list of token ids instead of a string in the `text_generation.Client.generate` method. ### Motivation The official implementation...

Make MMLU closer to original implementation

CoT currently consists of GSM8K, MATH, BBH & MMLU. The first three things are fine and it has been shown that CoT gives significant improvements for these benchmarks. However, the...

bug

existing-benchmark

Custom prompt template

Make it possible for the user to specify the prompt template through cmdline models args without needing to add code

enhancement

AgentBench

Create plot from the results

Use the resulting JSON files in the `reports/` folder to generate some nice looking plots like in https://www.xlang.ai/blog/openlemur

enhancement

Improve documentation

enhancement

Improve results written to stdout

- Currently, it writes all results to stdout. It should instead only print the requested results. - More details would be nice. Currently we only print a very coarse summary...

enhancement

Custom test data: Allow specifying number of repetitions

enhancement

existing-benchmark

Allow HF dataset for custom test data

enhancement

existing-benchmark

Tool usage

Something that can measure how well a LLM can deal with tools. CoT already kind of goes that way, but not really since it's limited to a bit of mathematical...

enhancement

new-benchmark