off-policy
off-policy copied to clipboard
PyTorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.
Traceback (most recent call last): File "train_mpe.py", line 157, in main(sys.argv[1:]) File "train_mpe.py", line 147, in main total_num_steps = runner.run() File "D:\off-policy-release\offpolicy\runner\mlp\base_runner.py", line 153, in run env_info = self.collecter(explore=True, training_episode=True,...
环境可视化问题
请问我运行项目以后只能得到weight&biass平台的数据指标,但不能把simple_XXX.py环境渲染出来是为什么啊?
why did the code require only one env when using rnn policy? https://github.com/marlbenchmark/off-policy/blob/release/offpolicy/scripts/train/train_mpe.py#L154
Hello, I have encountered some problems. I wonder if you can help me. that:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment...
**Describe the bug** When using PER with QMIX, an issue arises with the idx_range returned by the insert function of RecPolicyBuffer: > line 267, in insert for idx in range(idx_range[0],...
Hello, Thanks for open-sourcing a really good work. I was wondering if you guys can open-source the MASAC code base as it would help to understand the variations of MASAC...
Run time
How much time is usually needed when running on mpe by Qmix?
in mqmix mixer self.hyper_b2 = nn.Sequential( init_(nn.Linear(self.cent_obs_dim, self.hypernet_hidden_dim)), nn.ReLU(), init_(nn.Linear(self.hypernet_hidden_dim, 1)) ).to(self.device) should be self.hyper_b2 = nn.Sequential( init_(nn.Linear(self.cent_obs_dim, self.mixer_hidden_dim)), nn.ReLU(), init_(nn.Linear(self.mixer_hidden_dim, 1)) ).to(self.device) ?
As I work with this code, I find that what wandb records is somewhat different from what I intuitively expect. When I try to train mqmix with MPE environment, in...