[Bug] CodeTrans UT test fail
Priority
P1-Stopper
OS type
Ubuntu
Hardware type
Xeon-GNR
Installation method
- [x] Pull docker images from hub.docker.com
- [x] Build docker images from source
- [ ] Other
- [ ] N/A
Deploy method
- [x] Docker
- [x] Docker Compose
- [ ] Kubernetes Helm Charts
- [ ] Kubernetes GMC
- [ ] Other
- [ ] N/A
Running nodes
Single Node
What's the version?
Description
https://github.com/opea-project/GenAIExamples/actions/runs/15189596329/job/42752387718
docker logs codetrans-xeon-vllm-service INFO 05-23 02:56:34 [__init__.py:239] Automatically detected platform cpu. WARNING 05-23 02:56:36 [_logger.py:72] Torch Profiler is enabled in the API server. This should ONLY be used for local developm ent! INFO 05-23 02:56:36 [api_server.py:1034] vLLM API server version 0.8.3 INFO 05-23 02:56:36 [api_server.py:1035] args: Namespace(host='0.0.0.0', port=80, uvicorn_log_level='info', disable_uvicorn_acc ess_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora _modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_ke yfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], ret urn_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice =False, tool_call_parser=None, tool_parser_plugin='', model='mistralai/Mistral-7B-Instruct-v0.3', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='aut o', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat. AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pa ttern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parall el_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enab le_prefix_caching=None, prefix_caching_hash_algo='builtin', disable_sliding_window=False, use_v2_block_manager=True, num_lookah ead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batch ed_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max _logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager= False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenize r_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora= False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_f actors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adap ter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_facto r=0.0, enable_chunked_prefill=None, speculative_config=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mod e=None, served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=Non e, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler. Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_ cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calcul ate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable _log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_track ing=False) INFO 05-23 02:56:42 [config.py:600] This model supports multiple tasks: {'generate', 'reward', 'score', 'classify', 'embed'}. D efaulting to 'generate'. WARNING 05-23 02:56:42 [_logger.py:72] device type=cpu is not supported by the V1 Engine. Falling back to V0. INFO 05-23 02:56:42 [config.py:1634] Disabled the custom all-reduce kernel because it is not supported on current platform. WARNING 05-23 02:56:42 [_logger.py:72] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default. WARNING 05-23 02:56:42 [_logger.py:72] uni is not supported on CPU, fallback to mp distributed executor backend. INFO 05-23 02:56:42 [api_server.py:246] Started engine process with PID 273 /opt/venv/lib/python3.12/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py:25: FutureWarning: It is stro ngly recommended to run mistral models with --tokenizer-mode "mistral"to ensure correct encoding and decoding. self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config) INFO 05-23 02:56:46 [__init__.py:239] Automatically detected platform cpu. WARNING 05-23 02:56:47 [_logger.py:72] Torch Profiler is enabled in the API server. This should ONLY be used for local developm ent! INFO 05-23 02:56:47 [llm_engine.py:242] Initializing a V0 LLM engine (v0.8.3) with config: model='mistralai/Mistral-7B-Instruct -v0.3', speculative_config=None, tokenizer='mistralai/Mistral-7B-Instruct-v0.3', skip_tokenizer_init=False, tokenizer_mode=auto , revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_l en=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_ reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cpu, decoding_config=DecodingConfig(gui ded_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, o tlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=mis tralai/Mistral-7B-Instruct-v0.3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_pre fill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=N one, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,1 84,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_output s=True, /opt/venv/lib/python3.12/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py:25: FutureWarning: It is stro ngly recommended to run mistral models with--tokenizer-mode "mistral" to ensure correct encoding and decoding. self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config) INFO 05-23 02:56:48 [cpu.py:45] Using Torch SDPA backend. INFO 05-23 02:56:48 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be availab le. INFO 05-23 02:56:48 [cpu_worker.py:196] Profiling enabled. Traces will be saved to: /mnt INFO 05-23 02:56:48 [parallel_state.py:957] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0 ERROR 05-23 02:56:48 [engine.py:448] unsupported operand type(s) for *: 'int' and 'NoneType' ERROR 05-23 02:56:48 [engine.py:448] Traceback (most recent call last): ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", lin e 436, in run_mp_engine ERROR 05-23 02:56:48 [engine.py:448] engine = MQLLMEngine.from_vllm_config( ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process SpawnProcess-1: ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", lin e 128, in from_vllm_config ERROR 05-23 02:56:48 [engine.py:448] return cls( ERROR 05-23 02:56:48 [engine.py:448] ^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", lin e 82, in __init__ ERROR 05-23 02:56:48 [engine.py:448] self.engine = LLMEngine(*args, **kwargs) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 281, in __ init__ ERROR 05-23 02:56:48 [engine.py:448] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ ERROR 05-23 02:56:48 [engine.py:448] super().__init__(*args, **kwargs) ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, i n __init__ ERROR 05-23 02:56:48 [engine.py:448] self._init_executor() ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor ERROR 05-23 02:56:48 [engine.py:448] self._run_workers("load_model", ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 05-23 02:56:48 [engine.py:448] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/utils.py", line 2347, in run_method ERROR 05-23 02:56:48 [engine.py:448] return func(*args, **kwargs) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/worker/cpu_worker.py", line 233, in lo ad_model ERROR 05-23 02:56:48 [engine.py:448] self.model_runner.load_model() ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/worker/cpu_model_runner.py", line 491, in load_model ERROR 05-23 02:56:48 [engine.py:448] self.model = get_model(vllm_config=self.vllm_config) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.p y", line 14, in get_model ERROR 05-23 02:56:48 [engine.py:448] return loader.load_model(vllm_config=vllm_config) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py" , line 441, in load_model ERROR 05-23 02:56:48 [engine.py:448] model = _initialize_model(vllm_config=vllm_config) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py" , line 127, in _initialize_model ERROR 05-23 02:56:48 [engine.py:448] return model_class(vllm_config=vllm_config, prefix=prefix) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 486, in __init__ ERROR 05-23 02:56:48 [engine.py:448] self.model = self._init_model(vllm_config=vllm_config, ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 527, in _init_model ERROR 05-23 02:56:48 [engine.py:448] return LlamaModel(vllm_config=vllm_config, ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 151, in __init__ ERROR 05-23 02:56:48 [engine.py:448] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 321, in __init__ ERROR 05-23 02:56:48 [engine.py:448] self.start_layer, self.end_layer, self.layers = make_layers( ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 610, in make_layers ERROR 05-23 02:56:48 [engine.py:448] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 323, in <lambda> ERROR 05-23 02:56:48 [engine.py:448] lambda prefix: layer_type(config=config, ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 239, in __init__ ERROR 05-23 02:56:48 [engine.py:448] self.self_attn = LlamaAttention( ERROR 05-23 02:56:48 [engine.py:448] ^^^^^^^^^^^^^^^ ERROR 05-23 02:56:48 [engine.py:448] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 135, in __init__ ERROR 05-23 02:56:48 [engine.py:448] self.rotary_dim = int(partial_rotary_factor * self.head_dim) ERROR 05-23 02:56:48 [engine.py:448] ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ ERROR 05-23 02:56:48 [engine.py:448] TypeError: unsupported operand type(s) for *: 'int' and 'NoneType' Traceback (most recent call last): File "/root/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 450, in run_mp_engine raise e File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine engine = MQLLMEngine.from_vllm_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config return cls( ^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 281, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 286, in __init__ super().__init__(*args, **kwargs) File "/opt/venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/opt/venv/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor self._run_workers("load_model", File "/opt/venv/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/utils.py", line 2347, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/worker/cpu_worker.py", line 233, in load_model self.model_runner.load_model() File "/opt/venv/lib/python3.12/site-packages/vllm/worker/cpu_model_runner.py", line 491, in load_model self.model = get_model(vllm_config=self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 441, in load_model model = _initialize_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 127, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 486, in __init__ self.model = self._init_model(vllm_config=vllm_config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 527, in _init_model return LlamaModel(vllm_config=vllm_config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 151, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 321, in __init__ self.start_layer, self.end_layer, self.layers = make_layers( ^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 610, in make_layers maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 323, in <lambda> lambda prefix: layer_type(config=config, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 239, in __init__ self.self_attn = LlamaAttention( ^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 135, in __init__ self.rotary_dim = int(partial_rotary_factor * self.head_dim) ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ TypeError: unsupported operand type(s) for *: 'int' and 'NoneType' Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1121, in <module> uvloop.run(run_server(args)) File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/root/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/root/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/opt/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1069, in run_server async with build_async_engine_client(args) as engine_client: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client async with build_async_engine_client_from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_f rom_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.
Reproduce steps
cd CodeTrans/tests bash test_compose_on_xeon.sh
Raw log
Attachments
No response