[Bug] Take too much time on MATH-500 dataset evaluation
Prerequisite
- [x] I have searched Issues and Discussions but cannot get the expected help.
- [x] The bug has not been fixed in the latest version.
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
it can inference correctly
Reproduces the problem - code/configuration sample
python run.py --datasets math_500_gen --hf-type base --hf-path /home/maoshizhuo/2025/deepseek-Qwen-1.5B --debug --max-out-len 32768 02/25 23:53:14 - OpenCompass - INFO - Loading math_500_gen: /home/maoshizhuo/2025/opencompass/opencompass/configs/./datasets/math/math_500_gen.py 02/25 23:53:14 - OpenCompass - INFO - Loading example: /home/maoshizhuo/2025/opencompass/opencompass/configs/./summarizers/example.py 02/25 23:53:14 - OpenCompass - INFO - Current exp folder: outputs/default/20250225_235314 02/25 23:53:14 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 02/25 23:53:14 - OpenCompass - INFO - Partitioned into 1 tasks. 02/25 23:53:16 - OpenCompass - WARNING - Only use 1 GPUs for total 4 available GPUs in debug mode. 02/25 23:53:16 - OpenCompass - INFO - Task [deepseek-Qwen-1.5B_hf/math-500] 02/25 23:53:33 - OpenCompass - INFO - Try to load the data from /home/maoshizhuo/.cache/opencompass/./data/math/ 02/25 23:53:33 - OpenCompass - INFO - Start inferencing [deepseek-Qwen-1.5B_hf/math-500] 11%|███████████████ | 7/63 [13:49:33<118:18:44, 7605.80s/it]
Reproduces the problem - command or script
python run.py --datasets math_500_gen --hf-type base --hf-path /home/maoshizhuo/2025/deepseek-Qwen-1.5B --debug --max-out-len 32768
Reproduces the problem - error message
need too much time to get result about 131 hours
Other information
有什么办法加速推理吗?我注意到vllm可以加速推理,但是由于其集成了量化技术,得到的精度并不准确,我希望得到准确的结果并加速。我的实验环境有4张V100-32G GPU。谢谢!
If your model is a chat model, try to use --hf-type chat, this will use chat_template of the model. On the other hand, under the hood, OC uses HF to generate, try to call the original HF generate for one example to see if it also takes that long time.
Please check the prediction to find if there is a repeat pattern in response. And reduce the --max-out-len. Also, you can remove the --debug and use four workers.
The best option is to use vllm or lmdeploy, because the math requires the model to generate the reasoning process.
Please check the prediction to find if there is a repeat pattern in response. And reduce the --max-out-len. Also, you can remove the --debug and use four workers.
Could you please help me solve this issue? https://github.com/open-compass/opencompass/issues/1929