Laurens Weitkamp comments

Results 7 comments of


                                            Laurens Weitkamp

loss.backward error

Hey @Zhaocka, @Shi-YiWei, I am currently (slowly) working on a new version of this repository on the following branch: https://github.com/lweitkamp/option-critic-pytorch/tree/updated_oc. I could not find a way to remove the issue,...

loss.backward error

The issue ended up being how I access the state variable which is a PyTorch tensor. The latest commit fixes the issue!

Biased gradients

Hey @manuel-delverme. Do you mean moving lines [112-114](https://github.com/lweitkamp/option-critic-pytorch/blob/0c57da7686f8903ed2d8dded3fae832ee9defd1a/main.py#L112-L114) ``` state = option_critic.get_state(to_tensor(next_obs)) option_termination, greedy_option = option_critic.predict_option_termination(state, current_option) ``` following the optimisation step?

Biased gradients

Fixed as of the latest commit

Termination prob calculated over current state instead of the next state

I'm not so sure this is a problem; it's similar to line 128 in the [original code](https://github.com/jeanharb/option_critic/blob/master/neural_net.py). We are already calculating the Q value for s, so perhaps the authors...

Why not clean replay buffer after each episode for on-policy policy gradient update?

Hey, good question. As I recall, the policy-over-options is actually off-policy, whereas the intra-policies are calculated on-policy. We only use the replay buffer to update the policy-over-options. I believe they...

Why not clean replay buffer after each episode for on-policy policy gradient update?

Right, so the authors mention that learning both $Q_\Omega$ and $Q_U$ is computationally wasteful and decide to learn only $Q_\Omega$ and to derive an estimate of $Q_U$ from it. So...