Rujikorn Charakorn issues

Results 6 issues of


                                            Rujikorn Charakorn

Proper entropy regularized PPO

## Problem Description Seems like the current implementation of PPO use only one-step entropy bonus (not including the entropy bonus in the overall return). I see this as a ease...

About training data specification (metadata)

What is the specification of the training data (e.g. spatial resolution or the satellite that took the images)? Also, the training data seems like it is blue shifted compared to...

Where does action clipping happen?

Hey, thank you for such a great addition to multi-agent cooperative environment. I am playing with the environment and notice that the environment's action space is bounded within [-1,1]. But...

add return to render method

This allows mode='rgb_array' and 'depth_array' to return the array as in the original single-agent mujoco gym env. These modes are faster than mode='human'.

[Feature Request] XLA interface speed comparison

## Motivation If I understand correctly, the speed up of envpool comes from c++ implementation as supposed to python. So, I wonder if the XLA interface will provide anymore speed...

enhancement

Observation and reward normalization

I have a question regarding the FACMAC implementation. Did you use any wrapper such as observation/reward normalization or action clipping/rescaling? 'Cause in the original single-agent mode, the implementation usually use...