Wei Xiong

Results 1 issues of Wei Xiong

In the implementation of A2C, the code is policy_loss += self.entropy_weight * -log_prob # entropy maximization But I think since we are maximizing the entropy, in the loss, we shall...