Madhurjya Pegu

Results 2 issues of Madhurjya Pegu

As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone...