Mava
Mava copied to clipboard
[INVESTIGATION] MADDPG/MAD4PG are slower than MAPPO in certain instances (pixel-based environments?)
What do you want to investigate?
MADDPG/ MAD4PG are both significantly slower than MAPPO in certain instances.
- Run MADDPG on Coop pong/ PCB Grid for n steps with the same network size
- Run MAAPO on Coop pong/ PCB Grid for n steps with the same network size
- Observe that MAPPO takes significantly shorter to run the same number of executor steps PPO ~2:45 hours for 2e6 executor steps D4PG ~36 hours for 2e6 executor steps
NB: This could be in pixel-based environments or due to alternate environment characterisitcs. It could also be a launchpad issue since significantly more evaluator steps are being run for MADDPG when not setting an interval.
Definition of done
Baseline experiments highlighting specific characteristics/ instances that show a clear difference in performance Bug fixed (in system or env) if one exists