agents icon indicating copy to clipboard operation
agents copied to clipboard

`PPOAgent` ignores network.losses

Open zroug opened this issue 5 years ago • 8 comments

PPOAgent does not use regularization losses that are defined in the value_net and actor_net. It should add them to the other losses.

zroug avatar Jul 17 '20 12:07 zroug

@summer-yue Can you PTAL at this?

kbanoop avatar Jul 20 '20 18:07 kbanoop

Could you please elaborate? I see that the l2_regularization_loss includes the regularization losses and was added to total_loss. The coefficients are default to 0, but if you wish to use regularization losses, you should be able to change the value_function_l2_reg and policy_l2_reg.

summer-yue avatar Jul 20 '20 18:07 summer-yue

I mean regularization that was added to the network with the add_loss function. For example activity regularization and other more advanced regularization can be done with this function.

The losses can be obtained with the member losses of the Network class. DqnAgent for example uses them instead of implementing its own regularization:

https://github.com/tensorflow/agents/blob/dc7417b577c72af3026a761e01363e1587b256a1/tf_agents/agents/dqn/dqn_agent.py#L507

zroug avatar Jul 20 '20 21:07 zroug

Oh I see. Thanks for explaining. So you are pointing at the inconsistency in the implementation between PPO and other agents in terms of where losses are calculated, not that PPO doesn't do regularization.

We are checking with the team. There might be some historical reasons for this.

summer-yue avatar Jul 20 '20 21:07 summer-yue

Yes, exactly.

I formulated it that way because I had a network that had its regularization defined during the creation of the network via said function and they weren't applied, which is a bit surprising. Regularization that is defined in the PPOAgent constructor is applied just fine.

A drawback of the current implementation is that it can only do l1 and l2 regularization and that it can't be restricted to a specific set of weights for example.

Thanks for checking with the team!

zroug avatar Jul 20 '20 21:07 zroug

Sorry for the delayed response. We are not sure why this inconsistency exists, probably be some historical reasons. We could look into this once we get some bandwidth. Also feel free to submit a PR and contribute yourself :)

summer-yue avatar Aug 13 '20 04:08 summer-yue

This is simply oversight. I think we need to add an agent validator and ensure agents properly use layer losses - or raise an exception if they don't support separate losses.

ebrevdo avatar Apr 18 '21 02:04 ebrevdo

@oars fyi, can you include this in the validator bug?

ebrevdo avatar Apr 18 '21 02:04 ebrevdo