Madhurjya Pegu
Results
2
issues of
Madhurjya Pegu
As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone...