pegazik
Results
1
comments of
pegazik
> Hi, > > In replay_buffer, you specified: data_spec=agent.collect_data_spec Or PPO agent and RandomTFPolicy agent have different data_spec. (PPO algorithm gather n_steps step/trajectories at each iteration) > > As opposite...