verl when I use async vllm rollout in dapo get ERROR

add actor_rollout_ref.rollout.mode="async" in recipe/dapo/run_dapo_qwen2.5_32b.sh get error

[36m(AsyncvLLMServer pid=361610)[0m instance_id: 6f66dda9-3270-44cf-823e-6bbb7a51c151:Hotrws:1:0 initializes with external actors: ['HotrwsWorkerDict_0:0']
Error executing job with overrides: ['data.train_files=/home//0723/data/dapo-math-17k.parquet', 'data.val_files=/home//0723/data/aime-2024.parquet', 'data.prompt_key=prompt', 'data.truncation=left', 'data.max_prompt_length=2048', 'data.max_response_length=2048', 'data.gen_batch_size=6', 'data.train_batch_size=2', 'actor_rollout_ref.rollout.n=16', 'algorithm.adv_estimator=grpo', 'algorithm.use_kl_in_reward=False', 'algorithm.kl_ctrl.kl_coef=0.0', 'actor_rollout_ref.actor.use_kl_loss=False', 'actor_rollout_ref.actor.kl_loss_coef=0.0', 'actor_rollout_ref.actor.clip_ratio_low=0.2', 'actor_rollout_ref.actor.clip_ratio_high=0.28', 'actor_rollout_ref.actor.clip_ratio_c=10.0', 'algorithm.filter_groups.enable=True', 'algorithm.filter_groups.max_num_gen_batches=10', 'algorithm.filter_groups.metric=acc', 'actor_rollout_ref.model.use_remove_padding=True', 'actor_rollout_ref.actor.use_dynamic_bsz=True', 'actor_rollout_ref.ref.log_prob_use_dynamic_bsz=True', 'actor_rollout_ref.rollout.log_prob_use_dynamic_bsz=True', 'actor_rollout_ref.actor.ppo_max_token_len_per_gpu=4096', 'actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=4096', 'actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=4096', 'actor_rollout_ref.model.path=/home//qwen3_0.6B', 'actor_rollout_ref.model.enable_gradient_checkpointing=True', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.actor.optim.lr_warmup_steps=10', 'actor_rollout_ref.actor.optim.weight_decay=0.1', 'actor_rollout_ref.actor.ppo_mini_batch_size=2', 'actor_rollout_ref.actor.fsdp_config.param_offload=True', 'actor_rollout_ref.actor.fsdp_config.optimizer_offload=True', 'actor_rollout_ref.actor.entropy_coeff=0', 'actor_rollout_ref.actor.grad_clip=1.0', 'actor_rollout_ref.actor.loss_agg_mode=token-mean', 'actor_rollout_ref.actor.ulysses_sequence_parallel_size=1', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.80', 'actor_rollout_ref.rollout.tensor_model_parallel_size=1', 'actor_rollout_ref.rollout.enable_chunked_prefill=True', 'actor_rollout_ref.rollout.max_num_batched_tokens=4096', 'actor_rollout_ref.rollout.temperature=1.0', 'actor_rollout_ref.rollout.top_p=1.0', 'actor_rollout_ref.rollout.top_k=-1', 'actor_rollout_ref.rollout.val_kwargs.temperature=1.0', 'actor_rollout_ref.rollout.val_kwargs.top_p=0.7', 'actor_rollout_ref.rollout.val_kwargs.top_k=-1', 'actor_rollout_ref.rollout.val_kwargs.do_sample=True', 'actor_rollout_ref.rollout.val_kwargs.n=1', 'actor_rollout_ref.rollout.mode=async', 'actor_rollout_ref.ref.fsdp_config.param_offload=True', 'actor_rollout_ref.ref.ulysses_sequence_parallel_size=1', 'actor_rollout_ref.actor.fsdp_config.fsdp_size=-1', 'actor_rollout_ref.ref.strategy=fsdp2', 'actor_rollout_ref.actor.strategy=fsdp2', 'reward_model.reward_manager=dapo', 'actor_rollout_ref.rollout.enforce_eager=False', 'reward_model.overlong_buffer.enable=True', 'reward_model.overlong_buffer.len=512', 'reward_model.overlong_buffer.penalty_factor=1.0', 'trainer.logger=["console"]', 'trainer.project_name=DAPO', 'trainer.experiment_name=DAPO-Qwen2.5-32B', 'trainer.n_gpus_per_node=1', 'trainer.nnodes=1', 'trainer.val_before_train=False', 'trainer.test_freq=5', 'trainer.save_freq=5', 'trainer.total_epochs=1', 'trainer.default_local_dir=/home///ckpts/DAPO/DAPO-Qwen2.5-32B', 'trainer.device=npu', 'actor_rollout_ref.actor.use_torch_compile=False', 'actor_rollout_ref.ref.use_torch_compile=False', 'trainer.resume_mode=auto']
Traceback (most recent call last):
  File "/home//0723/verl_rollout/recipe/dapo/main_dapo.py", line 34, in main
    run_ppo(config)
  File "/home//0723/verl_rollout/recipe/dapo/main_dapo.py", line 56, in run_ppo
    ray.get(runner.run.remote(config))
  File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/worker.py", line 2772, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/_private/worker.py", line 919, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): [36mray::TaskRunner.run()[39m (pid=308753, ip=90.90.97.74, actor_id=7cf7430036ba13df78cee69501000000, repr=<main_dapo.TaskRunner object at 0xffcfce76b5b0>)
  File "/home//0723/verl_rollout/recipe/dapo/main_dapo.py", line 167, in run
    trainer.init_workers()
  File "/home//0723/verl_rollout/verl/trainer/ppo/ray_trainer.py", line 926, in init_workers
    self.async_rollout_manager = AgentLoopManager(
  File "/home//0723/verl_rollout/verl/experimental/agent_loop/agent_loop.py", line 423, in __init__
    self._initialize_llm_servers()
  File "/home//0723/verl_rollout/verl/experimental/agent_loop/agent_loop.py", line 475, in _initialize_llm_servers
    ray.get([server.init_engine.remote() for server in self.async_llm_servers])
ray.exceptions.RayTaskError(AttributeError): [36mray::AsyncvLLMServer.init_engine()[39m (pid=361610, ip=90.90.97.74, actor_id=06fec6cee17e2bfad0a4566a01000000, repr=<verl.workers.rollout.vllm_rollout.vllm_async_server.AsyncvLLMServer object at 0xffcfb1075390>)
  File "/root/anaconda3/envs/verl_/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/root/anaconda3/envs/verl_/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home//0723/verl_rollout/verl/workers/rollout/vllm_rollout/vllm_async_server.py", line 266, in init_engine
    vllm_config = self._create_engine_config(engine_args)
  File "/home//0723/verl_rollout/verl/workers/rollout/vllm_rollout/vllm_async_server.py", line 294, in _create_engine_config
    zmq_addresses = ray.get([worker.get_zeromq_address.remote() for worker in workers])
  File "/home//0723/verl_rollout/verl/workers/rollout/vllm_rollout/vllm_async_server.py", line 294, in <listcomp>
    zmq_addresses = ray.get([worker.get_zeromq_address.remote() for worker in workers])
  File "/root/anaconda3/envs/verl_/lib/python3.10/site-packages/ray/actor.py", line 1534, in __getattr__
    raise AttributeError(
AttributeError: 'ActorHandle' object has no attribute 'get_zeromq_address'

Aug 01 '25 10:08 HelloWorldBeginner

I’m running into the same issue—have you made any progress toward a fix or workaround?

Aug 10 '25 12:08 ColorDavid

Same here. Any fix added for this?

Aug 15 '25 22:08 engint-cerebras

same issues... The main package versions are as follows:

vllm 0.10.2 ray 2.49.0 torch 2.8.0 verl 0.5.0 megatron 0.5.1

Sep 18 '25 08:09 yangdongdong2000

same issues, using verl commit id c5b189a1af496d0bc68320cd1d5bd7a1f1e3638a

Sep 20 '25 17:09 Li-dongyang

It is related to the agent loop. when the mode is set to async, AgentLoopManager is used. see ppo/ray_trainer.py and main_ppo.py for example.

The reason that dapo with async does not work is that some initialization of ray env (or something i am not familiar) are not done by recipe/main_dapo.py, which AgentLoopManager will be using.

So for dapo with async working, it requires modifying recipe/dapo/main_dapo.py and recipe/dapo/dapo_ray_trainer.py, to have similar initializations as ppo. group filtering with reward results may be modified as well.

Sep 29 '25 11:09 holyseven

It is related to the agent loop. when the mode is set to async, AgentLoopManager is used. see ppo/ray_trainer.py and main_ppo.py for example.

The reason that dapo with async does not work is that some initialization of ray env (or something i am not familiar) are not done by recipe/main_dapo.py, which AgentLoopManager will be using.

So for dapo with async working, it requires modifying recipe/dapo/main_dapo.py and recipe/dapo/dapo_ray_trainer.py, to have similar initializations as ppo. group filtering with reward results may be modified as well.

Does anyone know if DAPO recipe has supported async mode or do they have such plans ongoing?

Oct 24 '25 07:10 PokeLu

Any resolution on this? @holyseven What specific changes were needed?

Nov 10 '25 18:11 gtangg12