opencompass [Bug] 微调过的Qwen2.5-7b输出全是感叹号/The output of fine-tuning Qwen2.5-7b is '！！！！！！'

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version.

Type

I have modified the code (config is not considered code), or I'm working on my own tasks/models/datasets.

Environment

{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0', 'GPU 0,1,2,3': 'Tesla V100-PCIE-32GB', 'MMEngine': '0.10.5', 'MUSA available': False, 'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105', 'OpenCV': '4.10.0', 'PyTorch': '2.5.1', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2023.1-Product Build 20230303 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.5.3 (Git Hash ' '66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - CUDA Runtime 12.1\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n' ' - CuDNN 90.1 (built against CUDA 12.4)\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.1, ' 'CUDNN_VERSION=9.1.0, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON ' '-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=old-style-cast ' '-Wno-missing-braces -fdiagnostics-color=always ' '-faligned-new -Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, ' 'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, ' 'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, ' 'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, ' 'USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, ' 'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]', 'TorchVision': '0.20.1', 'lmdeploy': "not installed:No module named 'lmdeploy'", 'numpy_random_seed': 2147483648, 'opencompass': '0.3.6+', 'sys.platform': 'linux', 'transformers': '4.46.3'}

Reproduces the problem - code/configuration sample

Just modify hf_qwen2_5_7b_instruct.py like that from opencompass.models import HuggingFacewithChatTemplate models = [ dict( type=HuggingFacewithChatTemplate, abbr='qwen2.5-7b-instruct-hf', path='/opt/disk2/models/qwen2.5_epoch_2', max_out_len=4096, batch_size=8, run_cfg=dict(num_gpus=1), ) ] 只修改了模型路径指定为本地微调过的qwen2.5-7b。 And modify eval_chat_demo.py like that from mmengine.config import read_base

with read_base(): from opencompass.configs.datasets.demo.demo_gsm8k_chat_gen import gsm8k_datasets from opencompass.configs.datasets.demo.demo_math_chat_gen import math_datasets from opencompass.configs.models.qwen2_5.hf_qwen2_5_7b_instruct import models as hf_qwen2_5_7b_instruct_models

datasets = gsm8k_datasets models = hf_qwen2_5_7b_instruct_models 只修改了读取的模型以及用到的数据集

Reproduces the problem - command or script

python run.py configs/eval_chat_demo.py

Reproduces the problem - error message

No error.The output is '!!!!!!!' 没有任何报错信息，但是在*/predictions里的结果全是感叹号。比如predictions里的demo_cmmlu-anatomy.json 测评数据集用demo_gsm8k结果一样都是感叹号但是改用官方未微调过的qwen2.5-7b，可以输出正常结果

Other information

微调和未微调过的Qwen2.5-7b，用官方的infer代码都能正常输出。而用微调后的模型用本项目的测评代码只能输出感叹号下图是用微调过的模型，并且用官方的代码输出的结果。能够正常输出不是'!!!!!!'

不知道是什么原因导致的

Dec 03 '24 08:12 noname342

你好，请问如何测评本地的微调大模型。我在相关文档里没有找了类似的教程

Dec 19 '24 12:12 uperLu

你好，请问如何测评本地的微调大模型。我在相关文档里没有找了类似的教程

和评测本地开源模型一样，替换模型文件路径就行了

Dec 25 '24 03:12 Redias

Prerequisite

[x] I have searched Issues and Discussions but cannot get the expected help.

[x] The bug has not been fixed in the latest version.

Type

I have modified the code (config is not considered code), or I'm working on my own tasks/models/datasets.

Environment

{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0', 'GPU 0,1,2,3': 'Tesla V100-PCIE-32GB', 'MMEngine': '0.10.5', 'MUSA available': False, 'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105', 'OpenCV': '4.10.0', 'PyTorch': '2.5.1', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2023.1-Product Build 20230303 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.5.3 (Git Hash ' '66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - CUDA Runtime 12.1\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n' ' - CuDNN 90.1 (built against CUDA 12.4)\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.1, ' 'CUDNN_VERSION=9.1.0, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON ' '-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=old-style-cast ' '-Wno-missing-braces -fdiagnostics-color=always ' '-faligned-new -Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, ' 'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, ' 'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, ' 'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, ' 'USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, ' 'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]', 'TorchVision': '0.20.1', 'lmdeploy': "not installed:No module named 'lmdeploy'", 'numpy_random_seed': 2147483648, 'opencompass': '0.3.6+', 'sys.platform': 'linux', 'transformers': '4.46.3'}

Reproduces the problem - code/configuration sample

Just modify hf_qwen2_5_7b_instruct.py like that from opencompass.models import HuggingFacewithChatTemplate models = [ dict( type=HuggingFacewithChatTemplate, abbr='qwen2.5-7b-instruct-hf', path='/opt/disk2/models/qwen2.5_epoch_2', max_out_len=4096, batch_size=8, run_cfg=dict(num_gpus=1), ) ] 只修改了模型路径指定为本地微调过的qwen2.5-7b。 And modify eval_chat_demo.py like that from mmengine.config import read_base

with read_base(): from opencompass.configs.datasets.demo.demo_gsm8k_chat_gen import gsm8k_datasets from opencompass.configs.datasets.demo.demo_math_chat_gen import math_datasets from opencompass.configs.models.qwen2_5.hf_qwen2_5_7b_instruct import models as hf_qwen2_5_7b_instruct_models

datasets = gsm8k_datasets models = hf_qwen2_5_7b_instruct_models 只修改了读取的模型以及用到的数据集

Reproduces the problem - command or script

python run.py configs/eval_chat_demo.py

Reproduces the problem - error message

No error.The output is '!!!!!!!' 没有任何报错信息，但是在*/predictions里的结果全是感叹号。比如predictions里的demo_cmmlu-anatomy.json 测评数据集用demo_gsm8k结果一样都是感叹号但是改用官方未微调过的qwen2.5-7b，可以输出正常结果

Other information

微调和未微调过的Qwen2.5-7b，用官方的infer代码都能正常输出。而用微调后的模型用本项目的测评代码只能输出感叹号下图是用微调过的模型，并且用官方的代码输出的结果。能够正常输出不是'!!!!!!'

不知道是什么原因导致的

建议使用 lmdeploy 来中转一下，参见文档中的 加速评测 章节

Dec 25 '24 03:12 Redias

你好， qwen系列的模型评估生成的内容全部是感叹号，目前使用了官方的Qwen2-7B-Instruct、Qwen2.5-7B-Instruct，生成全部是感叹号，能看一下是什么原因么？而llama3-8B、glm-4-9b、Yi-1.5-9B可以正常运行

Jan 03 '25 02:01 TanateT

tmp_gsm8k.json

Jan 03 '25 02:01 TanateT

你好， qwen系列的模型评估生成的内容全部是感叹号，目前使用了官方的Qwen2-7B-Instruct、Qwen2.5-7B-Instruct，生成全部是感叹号，能看一下是什么原因么？而llama3-8B、glm-4-9b、Yi-1.5-9B可以正常运行

我这边用这些模型都是可以的，可能是你的评估数据集存在bug，你多用几种不同数据集确实是否一样，另外，还是一样的话，同样建议你用vllm/lmdeploy来替换默认的推理引擎，一方面可以加速，一方面兼容性会好点

Jan 03 '25 11:01 Redias

你好，我适配的是Ascend910的显卡，现在整体是跑通的使用的是你们提供的所有数据集，除了qwen其他都是可以跑通，且指标显示是正常，只要是qwen系列就会出现感叹号的问题，请问是否是提示词的template的原因，如果是，可以修改哪里来解决呢？因为我之前使用mindie进行评估时也会出现感叹号，就是template的原因

Jan 05 '25 05:01 TanateT

你好，我适配的是Ascend910的显卡，现在整体是跑通的使用的是你们提供的所有数据集，除了qwen其他都是可以跑通，且指标显示是正常，只要是qwen系列就会出现感叹号的问题，请问是否是提示词的template的原因，如果是，可以修改哪里来解决呢？因为我之前使用mindie进行评估时也会出现感叹号，就是template的原因

环境内执行 python tools/list_configs.py 就能看到所有的数据集配置了，compass的评估任务最小颗粒度就是按数据集分的，更细致的自定义你可以看 configs 文件夹下以 eval 开头的预设配置的格式，Ascend910 我没试过，我认为你可以仿照已有的huggingface/vllm/lmdeploy写一个mindie的推理流程

Jan 05 '25 09:01 Redias

这个问题我自己这边的解决方法是，首先检查fp16的精度问题，然后我升级了cuda为121，transformers从3.34升级到了3.37，然后就好了。

Jan 27 '25 03:01 141forever

请问解决了吗

Aug 27 '25 10:08 PrettyMagnolia

请问解决了吗

这个项目官方现在维护的比较慢，也没人响应复现解决吧，预计还是存在的，我的建议就是不要用纯transformer，或者你可以自己先启动一个在线API，然后对API评估，这样就可以兼容大多数情况了

Aug 27 '25 10:08 Redias

请问解决了吗

这个项目官方现在维护的比较慢，也没人响应复现解决吧，预计还是存在的，我的建议就是不要用纯transformer，或者你可以自己先启动一个在线API，然后对API评估，这样就可以兼容大多数情况了

好的感谢感谢

Sep 04 '25 12:09 PrettyMagnolia