[INVESTIGATION] Updating DDPG with only the critic network yields better performance

Open AsadJeewa opened this issue 3 years ago • 0 comments

What do you want to investigate?

Updating DDPG with only the critic network yields better performance. For this to be carried out, we need to update the DDPG actor with the observation network and critic with the target observation network, at timestep t, which is different from standard practice (ACME as well). https://github.com/instadeepai/Mava/blob/develop/mava/systems/tf/maddpg/training.py#L258

Definition of done

DDPG benchmarked on environments with obs networks (co-op and pcb-grid [8x8,3 agents])

[Optional] Results

What was the conclusion of your investigation?

[Optional] Discussion/Future Investigations

This could be a link to a Github discussions page.

Apr 20 '22 09:04 AsadJeewa