Wonung Kim issues

Repositories
Issues
Comments

Results 1 issues of


                                            Wonung Kim

Benchmark results can not be reproduced

I tested transnormerllm-385m with llm-eval-harness for boolq benchmark. However, the result is not aligned to that result you have reported. As well as boolq benchmark, and 385m model, other benchmarks...