agents Error using a trained PPO policy

Hello, I'm trying to use a PPO tf-agent with a trained policy, but I get the following error

ValueError Traceback (most recent call last) in () 1 ----> 2 evaluate(environment_eval, eval_env, eval_policy, num_episodes=3)

in evaluate(py_environment, tf_environment, policy, num_episodes) 41 42 while not time_step.is_last(): ---> 43 action, policy_state, _ = policy.action(time_step, policy_state) 44 time_step = tf_environment.step(action) 45 print(py_environment.render())

/Users/luca/venv/lib/python3.7/site-packages/tf_agents/policies/tf_policy.py in action(self, time_step, policy_state, seed) 276 277 if self._automatic_state_reset: --> 278 policy_state = self._maybe_reset_state(time_step, policy_state) 279 step = action_fn(time_step=time_step, policy_state=policy_state, seed=seed) 280

/Users/luca/venv/lib/python3.7/site-packages/tf_agents/policies/tf_policy.py in _maybe_reset_state(self, time_step, policy_state) 241 # time_step in the sequence as we can't easily generalize how the policy is 242 # unrolled over the sequence. --> 243 if nest_utils.get_outer_rank(time_step, self._time_step_spec) > 1: 244 condition = time_step.is_first()[:, 0, ...] 245 return nest_utils.where(condition, zero_state, policy_state)

/Users/luca/venv/lib/python3.7/site-packages/tf_agents/utils/nest_utils.py in get_outer_rank(tensors, specs) 524 'Saw tensor_shapes:\n %s\n' 525 'And spec_shapes:\n %s' % --> 526 (num_outer_dims, tensor_shapes, spec_shapes)) 527 528

ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs. num_outer_dims: 1. Saw tensor_shapes: [TensorShape([1]), TensorShape([1]), TensorShape([1]), TensorShape([1, 960, 18])] And spec_shapes: [TensorShape([]), TensorShape([]), TensorShape([]), TensorShape([1, 960, 18])]

Here are my agent and Networks definitions

def create_networks(tf_env,conv_layer_params): actor_net = ActorDistributionRnnNetwork( tf_env.observation_spec(), tf_env.action_spec(), conv_layer_params=None, input_fc_layer_params=(200,100), lstm_size=(200,100), output_fc_layer_params=None ) value_net = ValueRnnNetwork( tf_env.observation_spec(), conv_layer_params=None, input_fc_layer_params=(200,100), lstm_size=(200,100), output_fc_layer_params=None, activation_fn=tf.nn.elu )

return actor_net, value_net

actor_net, value_net = create_networks(tf_env,conv_layer_params) agent = ppo_agent.PPOAgent( tf_env.time_step_spec(), tf_env.action_spec(), optimizer, actor_net=actor_net, value_net=value_net, num_epochs=num_epochs, gradient_clipping=0.2, entropy_regularization=1e-2, importance_ratio_clipping=0.2, use_gae=True, use_td_lambda_return=True )

agent.initialize()

eval_policy = agent.policy

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer( agent.collect_data_spec, batch_size=tf_env.batch_size, max_length=replay_buffer_capacity) policy_checkpointer = common.Checkpointer( ckpt_dir=policy_dir, policy=eval_policy, global_step=global_step)

policy_checkpointer.initialize_or_restore()

and I use the following function for getting actions value (with the error I mentioned when I call it)

def evaluate(py_environment: PyEnvironment, tf_environment: TFEnvironment, policy: tf_policy.Base, num_episodes=10):

    for episode in range(num_episodes):
        logging.info("Generating episode %d of %d" % (episode, num_episodes))
        state = policy.get_initial_state(tf_environment.batch_size)
        time_step = tf_environment.reset()
        policy_state = policy.get_initial_state(tf_environment.batch_size)
       
        
        while not time_step.is_last():
            action, policy_state, _ = policy.action(time_step, policy_state)
            time_step = tf_environment.step(action)
            print(py_environment.render())

evaluate(environment_eval, eval_env, eval_policy, num_episodes=3)

**_any idea?

thank you_**

Oct 04 '20 15:10 gbuonamico

Should your observation spec in your environment be TensorShape([960, 18]) instead of TensorShape([1, 960, 18])?

Oct 05 '20 19:10 summer-yue

Hello, thank you for replying. No because using LSTM in actor and value Network needs an additional dimension. The train works fine. This is the portion of code I use for training

def train_step(): trajectories = replay_buffer.gather_all() return tf_agent.train(experience=trajectories)

collect_time = 0 train_time = 0 time_step = None timed_at_step = global_step.numpy() while environment_steps_metric.result() < num_environment_steps: current_metrics = [] start_time = time.time() #collect_driver.run() collect_driver.run() collect_time += time.time() - start_time start_time = time.time() total_loss, _ = train_step() replay_buffer.clear() train_time += time.time() - start_time

Oct 06 '20 04:10 gbuonamico

Hello, any suggestion will be appreciated....

Oct 09 '20 04:10 gbuonamico

Sorry about the delay. Let me take a closer look at this today afternoon.

Oct 09 '20 17:10 summer-yue

Not a problem. I think the problem come from the use of the wrapper "train_step=common.function(train_step)" in the training phase. keep this in mind tomorrow and let me know please...

Oct 09 '20 19:10 gbuonamico

The ValueError you're seeing is saying that your input time_step into policy.action is not aligned with the spec it is expecting. Could you try not including the additional dimension in your spec for observations, despite it being LSTM? I think the agent handles that by checking the network later on.

I also tried running your code on Cartpole and it finished successfully, so the issue seems like with the environment.

Adding @oars who's more knowledgable than me this front to confirm.

Oct 09 '20 23:10 summer-yue

I agree on the fact that time_step and policy_state are not aligned. time_step is batched, while policy_state (initial state) is not.

If I not include the additional dimension in the observation space I got the following error "ValueError: Shapes (960, 18) and (1, 960, 18) are incompatible" while trying to load the checkpoint (which is what I expected).

If you look at the error, both tensors has the right dimension for the observation (1,960,18) but they differ on the fact that time_step is batched (has dimension [1] in the three fields before observation shape) and policy_state is not (has dimension [] in the same fields)

ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs. num_outer_dims: 1. Saw tensor_shapes: [TensorShape([1]), TensorShape([1]), TensorShape([1]), TensorShape([1, 960, 18])] And spec_shapes: [TensorShape([]), TensorShape([]), TensorShape([]), TensorShape([1, 960, 18])]

My question is how can I add this batch dimension to policy_state observation?

Additional note : in the train phase (which is working fine), I got the same error while loading the checkpoint if I do not use common.function for the train_step and agent.train....

Oct 10 '20 08:10 gbuonamico

Could you remove the extra dimension in you spec (not in your observation), such that the spec_shapes is [960, 18] and your observation is still [1, 960, 18]?

Oct 13 '20 17:10 summer-yue

Sorry but I don't understand what you mean. In my environment the definition of action_spec and observation_spec are the following

**self._action_spec = array_spec.BoundedArraySpec( shape=(), dtype=np.int32, minimum=0, maximum=2, name='action')

ns=(1,self.shape[0],self.shape[1]) self._observation_spec = array_spec.BoundedArraySpec( shape=ns, dtype=np.float32, name='observation')**

where self.shape[0] and self.shape[1] are dimension given in input (960,18).

This are the only "spec" definitions I have in my environment.

Do you mind to be a little bit more specific, please?

Oct 14 '20 19:10 gbuonamico

Sure. I was suggesting to modify your observation spec to:

# Note that we are removing the extra 1 at the front here.
ns=(self.shape[0],self.shape[1])
self._observation_spec = array_spec.BoundedArraySpec(
shape=ns, dtype=np.float32, name='observation')

And keep the actual observation data as what you had before.

Oct 14 '20 19:10 summer-yue

Thats what I did as you suggested, and where I got the error I was talking about in my previous comment

"ValueError: Shapes (960, 18) and (1, 960, 18) are incompatible"

Oct 15 '20 19:10 gbuonamico

Are you able to provide code to your environment? Better if it's not too complicated. As I cannot reproduce the issue you're seeing in standard environments, it's a bit hard to debug from my end. Thanks!

Oct 20 '20 18:10 summer-yue

Well, that s not possible as the environment needs a database and additional procedures to run. But it's a standard python environment wrapped into a TFEnvironment. No changes are made to the action_spec and observation_spec you have seen before. Just for my understanding (then I will stop bother you..): the error message points the difference in the dimension of (in Bold) Saw tensor_shapes: [TensorShape([1]), TensorShape([1]), TensorShape([1]), TensorShape([1, 960, 18])] And spec_shapes: [TensorShape([]), TensorShape([]), TensorShape([]), TensorShape([1, 960, 18])],

while the shapes of the observation are both ok for tensor_shapes and spec_shapes ([1, 960, 18] in both). This is, for me, something which is not related with the trained agent, but maybe in the policy saver or a wrapper used (like common.function), but my knowledge of this functions is quite limited Again thank you for your time

Oct 21 '20 11:10 gbuonamico

Sorry that the previous suggestions weren't as helpful as I wished. I think I might understand where the confusion is. Let me try again.

The error message shows that the received tensor shape and the spec shape to be "not compatible" - though the word compatible isn't very well defined. If you look closer at the code where it's erring out in nest_utils.is_batched_nested_tensors, you will notice that tensor_shapes and spec_shapes are not required to be exactly the same. Both cases below are considered compatible:

tensor_shapes and spec_shapes are completely aligned, both unbatched.
tensor_shapes has one or more extra dimensions than spec_shapes in every field. For example, tensor_shape is [TensorShape([1]), TensorShape([1]), TensorShape([1]), TensorShape([1, 960, 18])] but spec_shape is [TensorShape([]), TensorShape([]), TensorShape([]), TensorShape([960, 18])], - note that the tensor_shape has one extra batch dimension. This is compatible. Hence earlier I suggested to reduce space_shape.observation to [960, 18] from [1, 960, 18] to make it compatible with your batched observation (batch_num=1) used in the eval code.

I might see why you think it's a policy saver or wrapper issue. It's possible. Maybe I didn't understand your issue very well. To clarify, in your code, before you save and reload the policy, if you just call evaluate on agent.policy right after training, do you see the same issue? If not, it would point to a bug in PolicySaver. I think it is extremely unlikely that the common.function wrapper would change the spec dimensions.

Oct 21 '20 17:10 summer-yue

Thank you for your answer. At the end I m using this workaround (in bold in the code. Not sure is great, but seems to work)

t_step = tf_environment.reset() t_step=tf.expand_dims(t_step.observation,axis=0) time_step = ts.restart(t_step, tf_environment.batch_size) state = policy.get_initial_state(tf_environment.batch_size) i=0 while not time_step.is_last(): policy_step: PolicyStep = policy.action(time_step, state) state = policy_step.state time_step = tf_environment.step(policy_step.action) time_step=tf.expand_dims(time_step.observation,axis=0) time_step = ts.restart(time_step,batch_size=tf_environment.batch_size) if (i%500==0): print(py_environment.render(),'Run:',i, 'Action', policy_step.action.numpy()) i+=1

Oct 22 '20 20:10 gbuonamico

But I remain frustrated for not really understanding what's the root problem.... Thank you for your patience

Oct 22 '20 20:10 gbuonamico