Asynchronous API for `ParallelRLEnv`
Hello, this work looks pretty cool and looking forward to using it in the future.
I was wondering if you would be interested in implementing EnvPool's Asynchronous API, which looks like below:
import envpool
import numpy as np
num_envs = 64
batch_size = 16
env = envpool.make("Pong-v5", env_type="gym", num_envs=num_envs, batch_size=batch_size)
action_num = env.action_space.n
env.async_reset()
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
(16, 4, 84, 84) [ 1 0 8 3 5 9 11 6 13 12 16 14 4 18 2 19]
(16, 4, 84, 84) [23 24 17 21 25 26 28 20 32 31 22 7 15 29 27 30]
(16, 4, 84, 84) [34 10 38 41 40 35 33 36 39 37 42 48 51 50 52 44]
The general idea is to return a subset of the environments for the agent to sample actions while the environments execute other actions. This approach should scale considerably better, primarily when the engine backend uses socket (#219). In CleanRL we have a fast PPO implementation prototype that leverages this async API (see code here)
https://github.com/Farama-Foundation/Gymnasium/pull/98 also contains an example of implementing this type of Async API with existing vectorized environments.
Ah very cool @vwxyzjn! I'm learning a bit on how this is different from MultiProcessRLEnv (source). In general I agree yes that this would be good to support, all the best RL infrastructure's I know have been going async like this.
I think @edbeeching will comment when he's back in a couple days.