opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Bug] model-kwargs参数和generation-kwargs参数无法通过CLI命令行指定

Open XIAOHUIL1 opened this issue 10 months ago • 3 comments

先决条件

  • [x] 我已经搜索过 问题讨论 但未得到预期的帮助。
  • [x] 错误在 最新版本 中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True, 'CUDA_HOME': '/opt/dtk', 'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0', 'GPU 0,1,2,3,4,5,6,7': 'BW200', 'MMEngine': '0.10.7', 'MUSA available': False, 'NVCC': 'Not Available', 'OpenCV': '4.11.0', 'PyTorch': '2.4.1', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 10.3\n' ' - C++ Version: 201703\n' ' - Intel(R) Math Kernel Library Version ' '2020.0.4 Product Build 20200917 for Intel(R) 64 ' 'architecture applications\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX2\n' ' - HIP Runtime 6.1.25082\n' ' - MIOpen 2.16.0\n' ' - Magma 2.8.0\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, ' 'CXX_COMPILER=/opt/rh/gcc-toolset-10/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 ' '-fvisibility-inlines-hidden -DUSE_PTHREADPOOL ' '-DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI ' '-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=pedantic ' '-Wno-error=old-style-cast -Wno-missing-braces ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, ' 'FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, ' 'USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, ' 'USE_EXCEPTION_PTR=1, USE_GFLAGS=1, USE_GLOG=1, ' 'USE_GLOO=1, USE_MKL=ON, USE_MKLDNN=0, ' 'USE_MPI=1, USE_NCCL=1, USE_NNPACK=ON, ' 'USE_OPENMP=1, USE_ROCM=ON, ' 'USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.12 (main, Feb 27 2025, 11:02:11) [GCC 11.4.0]', 'TorchVision': '0.19.1', 'lmdeploy': "not installed:No module named 'lmdeploy'", 'numpy_random_seed': 2147483648, 'opencompass': '0.4.1+', 'sys.platform': 'linux', 'transformers': '4.51.1'}

重现问题 - 代码/配置示例

我正在使用命令行,希望能通过model-kwargs和generation-kwargs指定dtype,enforce_eager和temperature等参数 我试过了很多方法,包括在opencompass/models/vllm.py和vllm_with_tf_above_v4_33.py中修改源码,都失败了,我只能硬编码指定参数而不能通过CLI传参。

重现问题 - 命令或脚本

命令行如下: opencompass --hf-type base --hf-path /models/DeepSeek-R1-Distill-Qwen-1.5B --datasets humaneval_gen_66a7f4 --hf-num-gpus 1 --model-kwargs dtype=float16 --batch-size 256 -a vllm --generation-kwargs do_sample=True top_k=50 top_p=0.95 temperature=0.95 --debug --dump-eval-details --dump-extract-rate

重现问题 - 错误信息

输出的内容如下:

04/18 16:08:45 - OpenCompass - INFO - Task [DeepSeek-R1-Distill-Qwen-1.5B_hf-vllm/openai_humaneval] [2025-04-18 16:08:48,466] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) INFO 04-18 16:08:49 init.py:193] Automatically detected platform rocm. model_kwargs passed to VLLM: {'max_model_len': None, 'tensor_parallel_size': 1} Model kwargs after update: {'trust_remote_code': True, 'dtype': 'bfloat16', 'enforce_eager': True} INFO 04-18 16:09:02 config.py:548] This model supports multiple tasks: {'classify', 'embed', 'reward', 'score', 'generate'}. Defaulting to 'generate'. INFO 04-18 16:09:02 config.py:1416] Disabled the custom all-reduce kernel because it is not supported on hcus. WARNING 04-18 16:09:02 arg_utils.py:1170] The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value. WARNING 04-18 16:09:02 rocm.py:115] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used WARNING 04-18 16:09:02 config.py:684] Async output processing is not supported on the current platform type cuda. INFO 04-18 16:09:02 llm_engine.py:235] Initializing a V0 LLM engine (v0.7.2) with config: model='/models/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='/models/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/models/DeepSeek-R1-Distill-Qwen-1.5B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False,

其他信息

No response

XIAOHUIL1 avatar Apr 18 '25 08:04 XIAOHUIL1

Thanks for your question, we will fix this issue.

tonysy avatar Apr 23 '25 10:04 tonysy

I encountered the same issue and found that it may be caused by a bug in https://github.com/open-compass/opencompass/blob/0.5.1.post1/opencompass/utils/run.py#L316, where the generation_kwargs parameter is missing when generating VLLMwithChatTemplate model config.

acc_model = dict(
    type=f'{mod.__module__}.{mod.__name__}',
    abbr=model['abbr'].replace('hf', 'vllm') if '-hf' in model['abbr'] else model['abbr'] + '-vllm',
    path=model['path'],
    model_kwargs=model_kwargs,
    max_seq_len=model.get('max_seq_len', None),
    max_out_len=model['max_out_len'],
    batch_size=model.get('batch_size', 16),
    run_cfg=model['run_cfg'],
    stop_words=model.get('stop_words', []),
)

I found that simply adding generation_kwargs resolves the issue:

acc_model = dict(
    type=f'{mod.__module__}.{mod.__name__}',
    abbr=model['abbr'].replace('hf', 'vllm') if '-hf' in model['abbr'] else model['abbr'] + '-vllm',
    path=model['path'],
    model_kwargs=model_kwargs,
    max_seq_len=model.get('max_seq_len', None),
    max_out_len=model['max_out_len'],
    batch_size=model.get('batch_size', 16),
    run_cfg=model['run_cfg'],
    stop_words=model.get('stop_words', []),
    generation_kwargs=model['generation_kwargs'].copy(),# add generation_kwargs
)

Logs when using --generation-kwargs temperature=1 max_tokens=512

Before:
10/23 16:56:07 - OpenCompass - INFO - SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=['<|im_end|>', '<|endoftext|>'], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None)

After:
10/23 17:17:41 - OpenCompass - INFO - SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=['<|endoftext|>', '<|im_end|>'], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None)

Aatrox103 avatar Oct 23 '25 09:10 Aatrox103

I encountered the same issue and found that it may be caused by a bug in https://github.com/open-compass/opencompass/blob/0.5.1.post1/opencompass/utils/run.py#L316, where the generation_kwargs parameter is missing when generating VLLMwithChatTemplate model config.

acc_model = dict( type=f'{mod.module}.{mod.name}', abbr=model['abbr'].replace('hf', 'vllm') if '-hf' in model['abbr'] else model['abbr'] + '-vllm', path=model['path'], model_kwargs=model_kwargs, max_seq_len=model.get('max_seq_len', None), max_out_len=model['max_out_len'], batch_size=model.get('batch_size', 16), run_cfg=model['run_cfg'], stop_words=model.get('stop_words', []), ) I found that simply adding generation_kwargs resolves the issue:

acc_model = dict( type=f'{mod.module}.{mod.name}', abbr=model['abbr'].replace('hf', 'vllm') if '-hf' in model['abbr'] else model['abbr'] + '-vllm', path=model['path'], model_kwargs=model_kwargs, max_seq_len=model.get('max_seq_len', None), max_out_len=model['max_out_len'], batch_size=model.get('batch_size', 16), run_cfg=model['run_cfg'], stop_words=model.get('stop_words', []), generation_kwargs=model['generation_kwargs'].copy(),# add generation_kwargs ) Logs when using --generation-kwargs temperature=1 max_tokens=512

Before:
10/23 16:56:07 - OpenCompass - INFO - SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=['<|im_end|>', '<|endoftext|>'], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None)

After:
10/23 17:17:41 - OpenCompass - INFO - SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=['<|endoftext|>', '<|im_end|>'], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None)

Thank you very much for your answer, which is very helpful to me. We tried to use the vLLM startup service for online inference to circumvent this potential error, but now it seems that offline inference can also be implemented.

XIAOHUIL1 avatar Oct 24 '25 07:10 XIAOHUIL1