Yacine

Results 1 comments of Yacine

Hi, In replay_buffer, you specified: data_spec=agent.collect_data_spec Or PPO agent and RandomTFPolicy agent have different data_spec. (PPO algorithm gather n_steps step/trajectories at each iteration) As opposite to DQN agent, the PPO...