liushz comments

Results 21 comments of


                                            liushz

官网上每个数据集所使用的config文件的版本如何知道？

For datasets on OpenCompass 1.0 Leaderboard, you can just move your cursor on the score of the dataset to find the config. For example, the config for C-Eval can be...

官网上每个数据集所使用的config文件的版本如何知道？

> > For datasets on OpenCompass 1.0 Leaderboard, you can just move your cursor on the score of the dataset to find the config. For example, the config for C-Eval...

[Feature] Typos in the official document

Thanks for pointing this typo, we will refine it soon.

[Feature] 请问使用vllm评测时怎么实现类似HF多卡数据并行？

![image](https://github.com/open-compass/opencompass/assets/28834990/26f053a5-c5f9-4c69-8c80-35455394b544) like above cfg, you can set `model_kwargs=dict(tensor_parallel_size=8),` for your case.

[Bug] math_gen数据集评估随机失败

Have you changed your partition logic midway? If you run it all at once, this problem shouldn't occur

[Bug] math_gen数据集评估随机失败

The error log for your eval stage is because there are some errors during your infer stage, so the length of prediction is different with refs, you can check the...

Apologies for the confusion. We are currently utilizing the `flores200` dataset; however, the configuration `flores_gen_806ede` mistakenly employs the prompt for `flores100`. We will address and rectify this issue promptly.

[Bug] Long text evaluation parameters are not clear

For optimal performance, it is advisable to configure the `max_seq_len` parameter to the highest value feasible, such as 32768 or even higher if possible. As for the `max_out_len`, it typically...

[Bug] accuracy error

What is your content in `eval_demo.py`? I use the default config: `/opencompass/configs/datasets/ARC_e/ARC_e_gen.py`, it works just fine.

[Feature] Support MMMLU Benchmark

Please add a default config named "mmmlu_gen.py" for chat model generation, with content like: ``` from mmengine.config import read_base with read_base(): from .mmmlu_gen_xxx.py import mmmlu_datasets # noqa: F401, F403 ```