Rujikorn Charakorn
Rujikorn Charakorn
## Problem Description Seems like the current implementation of PPO use only one-step entropy bonus (not including the entropy bonus in the overall return). I see this as a ease...
What is the specification of the training data (e.g. spatial resolution or the satellite that took the images)? Also, the training data seems like it is blue shifted compared to...
Hey, thank you for such a great addition to multi-agent cooperative environment. I am playing with the environment and notice that the environment's action space is bounded within [-1,1]. But...
This allows mode='rgb_array' and 'depth_array' to return the array as in the original single-agent mujoco gym env. These modes are faster than mode='human'.
## Motivation If I understand correctly, the speed up of envpool comes from c++ implementation as supposed to python. So, I wonder if the XLA interface will provide anymore speed...
I have a question regarding the FACMAC implementation. Did you use any wrapper such as observation/reward normalization or action clipping/rescaling? 'Cause in the original single-agent mode, the implementation usually use...