pegazik

Results 1 comments of pegazik

> Hi, > > In replay_buffer, you specified: data_spec=agent.collect_data_spec Or PPO agent and RandomTFPolicy agent have different data_spec. (PPO algorithm gather n_steps step/trajectories at each iteration) > > As opposite...