Update eval_bench_mark.py

Open ZizhengYang opened this issue 1 year ago • 2 comments

Use len(names) instead of 13 allows to run part of the evaluation benchmark each time, for machine does not have that much g-ram, this could be helpful.

Apr 22 '24 13:04 ZizhengYang

Thanks for the pr! I am busy with some projects as it approaches the final recently... I will get back to you as soon as possible.

Apr 24 '24 22:04 WeiXiongUST

hi, I just updated the evaluation script to support weighted average over different subsets. The current result matches that of the official leaderboard now. Could you re-pull a request accordingly?

Apr 29 '24 20:04 WeiXiongUST