opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Bug] 评估wikitext ppl数据集时候由于没有reference无法计算结果

Open LanDisen opened this issue 1 year ago • 3 comments

先决条件

  • [X] 我已经搜索过 问题讨论 但未得到预期的帮助。
  • [X] 错误在 最新版本 中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

克隆opencompass仓库并进入,并安装需要的包,完成环境配置

git clone https://github.com/open-compass/opencompass.git
cd opencompass
pip install -r requirements.txt

重现问题 - 代码/配置示例

我实现了eval_qwen2_7b.py,放在了configs目录下,用于评估Qwen-7b在wikitext数据集的ppl结果

from mmengine.config import read_base
from opencompass.models import HuggingFaceBaseModel

with read_base():
    from opencompass.configs.datasets.wikitext.wikitext_103_raw_ppl import wikitext_103_raw_datasets

datasets = wikitext_103_raw_datasets

models = [
    dict(
        type=HuggingFaceBaseModel,
        abbr='qwen-7b-hf',
        path='Qwen/Qwen-7B',
        max_out_len=1024,
        batch_size=32,
        run_cfg=dict(num_gpus=2),
    )
]

重现问题 - 命令或脚本

然后按照opencompass的README文件进行评估

python -u run.py configs/eval_qwen2_7b.py -w outputs/qwen2_7b --debug

重现问题 - 错误信息

日志结果如下:

python run.py configs/eval_qwen2_7b_wikitext.py -w outputs/qwen2_7b --debug
10/14 15:48:59 - OpenCompass - INFO - Current exp folder: outputs/qwen2_7b/20241014_154859
10/14 15:48:59 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
10/14 15:48:59 - OpenCompass - INFO - Partitioned into 1 tasks.
10/15 01:25:25 - OpenCompass - INFO - Partitioned into 2 tasks.
Traceback (most recent call last):
  File "/Users/lann/opencompass/eval/run.py", line 4, in <module>
    main()
  File "/Users/lann/opencompass/eval/cli/main.py", line 351, in main
    runner(tasks)
  File "/Users/lann/opencompass/opencompass/runners/base.py", line 38, in __call__
    status = self.launch(tasks)
             ^^^^^^^^^^^^^^^^^^
  File "/Users/lann/opencompass/opencompass/runners/local.py", line 131, in launch
    task.run()
  File "/Users/lann/opencompass/opencompass/tasks/openicl_eval.py", line 114, in run
    self._score()
  File "/Users/lann/opencompass/opencompass/tasks/openicl_eval.py", line 250, in _score
    result = icl_evaluator.score(**preds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lann/opencompass/opencompass/openicl/icl_evaluator/icl_hf_evaluator.py", line 70, in score
    if len(predictions) != len(references):
                           ^^^^^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

我查看outputs/qwen2_7b/predictions/wikitext-103-raw-validation.json的结果,发现里面没有gold,只有模型的predictions结果,因此在判断if len(predictions) != len(references):的时候,references是NoneType,其数量与predictions不一致,评估过程报错。 image

其他信息

No response

LanDisen avatar Oct 15 '24 03:10 LanDisen

你好,请问你现在解决了这个问题吗

1ucky2 avatar Apr 02 '25 15:04 1ucky2

没有解决,我后续转向使用lm-evaluation-harness做评估

LanDisen avatar Apr 03 '25 00:04 LanDisen

没有解决,我后续转向使用lm-evaluation-harness做评估

你好,这个lm-evaluation-harness我看也有人说,这个好像测wikitext和baseline不准确,请问你解决了吗? 而且我使用那个出现这个问题,但是我的模型是0.5b bs为1,请问你有遇到过这个问题吗,谢谢 /miniconda3/lib/python3.12/site-packages/torch/cuda/memory.py", line 738, in mem_get_info return torch.cuda.cudart().cudaMemGetInfo(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

1ucky2 avatar Apr 03 '25 03:04 1ucky2