Mava
Mava copied to clipboard
[INVESTIGATION] Updating DDPG with only the critic network yields better performance
What do you want to investigate?
Updating DDPG with only the critic network yields better performance. For this to be carried out, we need to update the DDPG actor with the observation network and critic with the target observation network, at timestep t, which is different from standard practice (ACME as well). https://github.com/instadeepai/Mava/blob/develop/mava/systems/tf/maddpg/training.py#L258
Definition of done
DDPG benchmarked on environments with obs networks (co-op and pcb-grid [8x8,3 agents])
[Optional] Results
What was the conclusion of your investigation?
[Optional] Discussion/Future Investigations
This could be a link to a Github discussions page.