Wei Xiong
Results
1
issues of
Wei Xiong
In the implementation of A2C, the code is policy_loss += self.entropy_weight * -log_prob # entropy maximization But I think since we are maximizing the entropy, in the loss, we shall...