Results 3 comments of domanspr

Should that be changed for the PPO Policy as all other GreedyPolicies/ Actor Policies have a different structure including the scale_diag argument (as part of dist_params)

It is the same PPO policy but for collection, the examples here show to use the collect policy (ppo_agent.collect_policy) of PPO and for training one uses the ppo_agent.policy (greedy policy)....

My code rather looks like this example (https://www.tensorflow.org/agents/tutorials/6_reinforce_tutorial) but instead of the reinforce agent I use the PPO Agent (https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ppo/ppo_agent.py) In addition, I use RNN actor and value networks and...