Overcooked Environment
This PR introduces a Overcooked cooking game environment for PufferLib. Sprites were from OvercookedAI
Currently training model for demo. Model is not producing meaningful actions. Still debugging.
Added intermediate reward shaping to the Overcooked environment to encourage cooperative cooking behavior and provide more frequent learning signals.
Changes
- Onion to pot: +0.1 reward when an agent adds an onion to a pot
- Correct recipe start: +0.1 reward when starting to cook a pot with exactly 3 onions (the target recipe)
- Soup plating: +0.1 reward when transferring a cooked soup from pot to plate
-
Dish serving:
- Correct recipe (3 onions): +5.0 to serving agent, +20.0 to all agents
- Incorrect recipe: +0.1 to all agents (small consolation reward)
I am now getting decent performance trajectories when training. But I do need some help. Still not sure if this will workout. 🙇
Explained Variance in the positive region! I assume this is a good sign?
Any advice what to try? or change?
Hmm... tried training with 1 agent, net can't fully learn how to cook.
do you mind describing your reward structure and rules ? maybe i can help
@Hadrien-Cr Hello! I have written a concise README describing the rewards and observations. https://github.com/mmbajo/PufferLib/tree/roze-overcooked-dev/pufferlib/ocean/overcooked