when I use async vllm rollout in dapo get ERROR
add actor_rollout_ref.rollout.mode="async" in recipe/dapo/run_dapo_qwen2.5_32b.sh get error
[36m(AsyncvLLMServer pid=361610)[0m instance_id: 6f66dda9-3270-44cf-823e-6bbb7a51c151:Hotrws:1:0 initializes with external actors: ['HotrwsWorkerDict_0:0']
Error executing job with overrides: ['data.train_files=/home//0723/data/dapo-math-17k.parquet', 'data.val_files=/home//0723/data/aime-2024.parquet', 'data.prompt_key=prompt', 'data.truncation=left', 'data.max_prompt_length=2048', 'data.max_response_length=2048', 'data.gen_batch_size=6', 'data.train_batch_size=2', 'actor_rollout_ref.rollout.n=16', 'algorithm.adv_estimator=grpo', 'algorithm.use_kl_in_reward=False', 'algorithm.kl_ctrl.kl_coef=0.0', 'actor_rollout_ref.actor.use_kl_loss=False', 'actor_rollout_ref.actor.kl_loss_coef=0.0', 'actor_rollout_ref.actor.clip_ratio_low=0.2', 'actor_rollout_ref.actor.clip_ratio_high=0.28', 'actor_rollout_ref.actor.clip_ratio_c=10.0', 'algorithm.filter_groups.enable=True', 'algorithm.filter_groups.max_num_gen_batches=10', 'algorithm.filter_groups.metric=acc', 'actor_rollout_ref.model.use_remove_padding=True', 'actor_rollout_ref.actor.use_dynamic_bsz=True', 'actor_rollout_ref.ref.log_prob_use_dynamic_bsz=True', 'actor_rollout_ref.rollout.log_prob_use_dynamic_bsz=True', 'actor_rollout_ref.actor.ppo_max_token_len_per_gpu=4096', 'actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=4096', 'actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=4096', 'actor_rollout_ref.model.path=/home//qwen3_0.6B', 'actor_rollout_ref.model.enable_gradient_checkpointing=True', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.actor.optim.lr_warmup_steps=10', 'actor_rollout_ref.actor.optim.weight_decay=0.1', 'actor_rollout_ref.actor.ppo_mini_batch_size=2', 'actor_rollout_ref.actor.fsdp_config.param_offload=True', 'actor_rollout_ref.actor.fsdp_config.optimizer_offload=True', 'actor_rollout_ref.actor.entropy_coeff=0', 'actor_rollout_ref.actor.grad_clip=1.0', 'actor_rollout_ref.actor.loss_agg_mode=token-mean', 'actor_rollout_ref.actor.ulysses_sequence_parallel_size=1', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.80', 'actor_rollout_ref.rollout.tensor_model_parallel_size=1', 'actor_rollout_ref.rollout.enable_chunked_prefill=True', 'actor_rollout_ref.rollout.max_num_batched_tokens=4096', 'actor_rollout_ref.rollout.temperature=1.0', 'actor_rollout_ref.rollout.top_p=1.0', 'actor_rollout_ref.rollout.top_k=-1', 'actor_rollout_ref.rollout.val_kwargs.temperature=1.0', 'actor_rollout_ref.rollout.val_kwargs.top_p=0.7', 'actor_rollout_ref.rollout.val_kwargs.top_k=-1', 'actor_rollout_ref.rollout.val_kwargs.do_sample=True', 'actor_rollout_ref.rollout.val_kwargs.n=1', 'actor_rollout_ref.rollout.mode=async', 'actor_rollout_ref.ref.fsdp_config.param_offload=True', 'actor_rollout_ref.ref.ulysses_sequence_parallel_size=1', 'actor_rollout_ref.actor.fsdp_config.fsdp_size=-1', 'actor_rollout_ref.ref.strategy=fsdp2', 'actor_rollout_ref.actor.strategy=fsdp2', 'reward_model.reward_manager=dapo', 'actor_rollout_ref.rollout.enforce_eager=False', 'reward_model.overlong_buffer.enable=True', 'reward_model.overlong_buffer.len=512', 'reward_model.overlong_buffer.penalty_factor=1.0', 'trainer.logger=["console"]', 'trainer.project_name=DAPO', 'trainer.experiment_name=DAPO-Qwen2.5-32B', 'trainer.n_gpus_per_node=1', 'trainer.nnodes=1', 'trainer.val_before_train=False', 'trainer.test_freq=5', 'trainer.save_freq=5', 'trainer.total_epochs=1', 'trainer.default_local_dir=/home///ckpts/DAPO/DAPO-Qwen2.5-32B', 'trainer.device=npu', 'actor_rollout_ref.actor.use_torch_compile=False', 'actor_rollout_ref.ref.use_torch_compile=False', 'trainer.resume_mode=auto']
Traceback (most recent call last):
File "/home//0723/verl_rollout/recipe/dapo/main_dapo.py", line 34, in main
run_ppo(config)
File "/home//0723/verl_rollout/recipe/dapo/main_dapo.py", line 56, in run_ppo
ray.get(runner.run.remote(config))
File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/worker.py", line 2772, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/worker.py", line 919, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): [36mray::TaskRunner.run()[39m (pid=308753, ip=90.90.97.74, actor_id=7cf7430036ba13df78cee69501000000, repr=<main_dapo.TaskRunner object at 0xffcfce76b5b0>)
File "/home//0723/verl_rollout/recipe/dapo/main_dapo.py", line 167, in run
trainer.init_workers()
File "/home//0723/verl_rollout/verl/trainer/ppo/ray_trainer.py", line 926, in init_workers
self.async_rollout_manager = AgentLoopManager(
File "/home//0723/verl_rollout/verl/experimental/agent_loop/agent_loop.py", line 423, in __init__
self._initialize_llm_servers()
File "/home//0723/verl_rollout/verl/experimental/agent_loop/agent_loop.py", line 475, in _initialize_llm_servers
ray.get([server.init_engine.remote() for server in self.async_llm_servers])
ray.exceptions.RayTaskError(AttributeError): [36mray::AsyncvLLMServer.init_engine()[39m (pid=361610, ip=90.90.97.74, actor_id=06fec6cee17e2bfad0a4566a01000000, repr=<verl.workers.rollout.vllm_rollout.vllm_async_server.AsyncvLLMServer object at 0xffcfb1075390>)
File "/root/anaconda3/envs/verl_/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/root/anaconda3/envs/verl_/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home//0723/verl_rollout/verl/workers/rollout/vllm_rollout/vllm_async_server.py", line 266, in init_engine
vllm_config = self._create_engine_config(engine_args)
File "/home//0723/verl_rollout/verl/workers/rollout/vllm_rollout/vllm_async_server.py", line 294, in _create_engine_config
zmq_addresses = ray.get([worker.get_zeromq_address.remote() for worker in workers])
File "/home//0723/verl_rollout/verl/workers/rollout/vllm_rollout/vllm_async_server.py", line 294, in <listcomp>
zmq_addresses = ray.get([worker.get_zeromq_address.remote() for worker in workers])
File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/actor.py", line 1534, in __getattr__
raise AttributeError(
AttributeError: 'ActorHandle' object has no attribute 'get_zeromq_address'
I’m running into the same issue—have you made any progress toward a fix or workaround?
Same here. Any fix added for this?
same issues... The main package versions are as follows:
vllm 0.10.2 ray 2.49.0 torch 2.8.0 verl 0.5.0 megatron 0.5.1
same issues, using verl commit id c5b189a1af496d0bc68320cd1d5bd7a1f1e3638a
It is related to the agent loop. when the mode is set to async, AgentLoopManager is used. see ppo/ray_trainer.py and main_ppo.py for example.
The reason that dapo with async does not work is that some initialization of ray env (or something i am not familiar) are not done by recipe/main_dapo.py, which AgentLoopManager will be using.
So for dapo with async working, it requires modifying recipe/dapo/main_dapo.py and recipe/dapo/dapo_ray_trainer.py, to have similar initializations as ppo. group filtering with reward results may be modified as well.
It is related to the agent loop. when the mode is set to async, AgentLoopManager is used. see ppo/ray_trainer.py and main_ppo.py for example.
The reason that dapo with async does not work is that some initialization of ray env (or something i am not familiar) are not done by recipe/main_dapo.py, which AgentLoopManager will be using.
So for dapo with async working, it requires modifying recipe/dapo/main_dapo.py and recipe/dapo/dapo_ray_trainer.py, to have similar initializations as ppo. group filtering with reward results may be modified as well.
Does anyone know if DAPO recipe has supported async mode or do they have such plans ongoing?
Any resolution on this? @holyseven What specific changes were needed?