Mingxiao Feng
Mingxiao Feng
Hi, @jaejaywoo , my method is that using the interfaces in `dm_env` to wrap each transition in SubprocVec into `dm_env.Timestep` instances and create N adders to add them into the...
Thanks a lot, @fastturtle , I will add a thread pool to make the calls parallelized~
Dear Developer, The algorithm I cited is actually a centralized multi-agent reinforcement learning algorithm independent of the time dimension. At any given timestep, it takes in the joint observations of...
Hi! Firstly, thanks for your reply! Regarding the action prediction of MAT, in the training stage, since we can directly sample from the off-policy buffer or use on-policy samples, the...
Your understanding is correct, and the way of tokenizing logits is different in training and inferencing phases. You can read the following code given encoder's embeddings which is copied in...
thank you for your reply, i will consider how to implement it^_^
@arvinxx 您好,最新的版本这个bug仍然未解决,麻烦有时间看一下,非常感谢