xzhang issues

Repositories
Issues
Comments

Results 4 issues of


                                            xzhang

mean kl is always=0

hi, I notice that in your code, mean_kl always=0 constraint_grad = flat_grad(constraint_loss, self.policy.parameters(), retain_graph=True) # (b) mean_kl = mean_kl_first_fixed(actions_dists, actions_dists) Fvp_fun = get_Hvp_fun(mean_kl, self.policy.parameters()) what is the meaning of a...

"from envs.ant_gather import AntGatherEnv"

there is no "ant_gather" in the envs folder.

a "bug"? in the cpo method

reward_advs -= reward_advs.mean() reward_advs /= reward_advs.std() cost_advs -= \textbf{reward_advs}.mean() cost_advs /= cost_advs.std() I guess on line 3, it should be mean of the cost?

line 2 lead to imp_sampling=1

log_action_probs = action_dists.log_prob(actions) imp_sampling = torch.exp(log_action_probs - log_action_probs.detach()) # Change to torch.matmul reward_loss = -torch.mean(imp_sampling * reward_advs) Since, log_action_probs - log_action_probs.detach()=0, imp_sampling is a all one vector