Benjamin Black
Benjamin Black
@Trinkle23897 As for your comment that it breaks the markov property, I think this is true. I would have to create a different environment where the previous two actions are...
Hi, what did you mean by " instead of letting MAPM changing buffer over and over again"? Is this something that the MAPM does now, or a potential solution to...
Ok, I updated the example so that the env returns the previous 2 observations, so I think the appropriate markov property should now hold.
@p-veloso I just saw this, but while the supersuit example here: https://github.com/PettingZoo-Team/SuperSuit#parallel-environment-vectorization is for stable baselines, all it does is translate the parallel environment into a vector environment. Since tianshou...
I aggree that keeping the algorithm (AC2, TD3, etc) separate from the framework (apex, parameter sharing, etc) is a powerful way of supporting a wide variety of use cases easily....
@cpnota One thing blocking this is that several internal features, including the generalized advantage buffer used by PPO, only work for parallel agents. And there is no parallel multiagent experiment.
I tried this with A2C (same code, just with a2c) and got the following error: ``` Traceback (most recent call last): File "independent_atari.py", line 7, in experiment.train(frames=2e6) File "/home/ben/class_projs/autonomous-learning-library/all/experiments/single_env_experiment.py", line...
Ah, I see. Yes, it is very hard to get that from the error message. Before, when we were trying to use ALL for our primary work with pettingzoo, the...
So more context for this particular issue, the problem came up with someone wanted to use PPO to train one agent and DQN to use another. This is a very...
A couple comments: 1) I took a look at the env_checker, and didn't see anything that should affect seeding. Perhaps this is a broader issue that affects all environments? 2)...