domanspr comments

Repositories
Issues
Comments

Results 3 comments of


                                            domanspr

action output and policy_step_spec structures do not match:

Should that be changed for the PPO Policy as all other GreedyPolicies/ Actor Policies have a different structure including the scale_diag argument (as part of dist_params)

action output and policy_step_spec structures do not match:

It is the same PPO policy but for collection, the examples here show to use the collect policy (ppo_agent.collect_policy) of PPO and for training one uses the ppo_agent.policy (greedy policy)....

action output and policy_step_spec structures do not match:

My code rather looks like this example (https://www.tensorflow.org/agents/tutorials/6_reinforce_tutorial) but instead of the reinforce agent I use the PPO Agent (https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ppo/ppo_agent.py) In addition, I use RNN actor and value networks and...