Results 1 issues of domanspr

If I use e.g. the Reinforce algorithm the call agent.collect_policy.action(time_step, policy_state) works as for all other algorithms except for PPO. Here the PPO Policy (which inherits from ActorPolicy) outpost the...