Claude Formanek
Claude Formanek
Having `tf.function` over the `_policy` function in the executor causes retracing as well because `_policy` is called inside a `for loop` in the `select_actions` function. It is better to create...
Reopening this issue until we fix the problem in all other systems.
In my experience, having the reverb sample inside `tf.function` is only a problem when you use a queue. So I expect it to be fine when using a regular replay...
I am trying to use this feature but unfortunately my SD card with the OS flashed on it is not appearing in the SD card dropdown menu. I am not...
Thank you so much for the speedy response. This is helpful and I look forward to the outcomes of your investigation. Just to be clear, in the plot above, the...
I noticed in your MATD3 implementation that you use the environment state in the critic instead of the joint observation. Do you think that the environments should be solvable given...
Thanks for the detailed response. I think your first point speaks to what I wanted to verify, namely that the intended design is that all the relevant information in the...
Hi @zyh1999, I suspect the difference in performance is due to the missing trajectories. The results in the paper used all of the trajectories. Can you try re-run your experiments,...
Hi there, I will be back at my PC on Monday and will be able to investigate the discrepancy in the reported performance for BC on 3m then. But in...
The reason the samples are only portions of an entire trajectory is simply a relic of how my replay buffer was implemented. It was convenient to unroll the recurrent neural...