Claude Formanek comments

Results 32 comments of


                                            Claude Formanek

[BUG] Remove nested tf.function

Having `tf.function` over the `_policy` function in the executor causes retracing as well because `_policy` is called inside a `for loop` in the `select_actions` function. It is better to create...

[BUG] Remove nested tf.function

Reopening this issue until we fix the problem in all other systems.

[BUG]: Remove Reverb sampler from trainer tf.function.

In my experience, having the reverb sample inside `tf.function` is only a problem when you use a queue. So I expect it to be fine when using a regular replay...

Feature Request: add support for importing customized RPI OS SD cards

I am trying to use this feature but unfortunately my SD card with the OS flashed on it is not appearing in the SD card dropdown menu. I am not...

[Question] Which of the MAMuJoCo environments are even "solvable"?

Thank you so much for the speedy response. This is helpful and I look forward to the outcomes of your investigation. Just to be clear, in the plot above, the...

[Question] Which of the MAMuJoCo environments are even "solvable"?

I noticed in your MATD3 implementation that you use the environment state in the critic instead of the joint observation. Do you think that the environments should be solvable given...

[Question] Which of the MAMuJoCo environments are even "solvable"?

Thanks for the detailed response. I think your first point speaks to what I wanted to verify, namely that the intended design is that all the relevant information in the...

A question about the buffer size

Hi @zyh1999, I suspect the difference in performance is due to the missing trajectories. The results in the paper used all of the trajectories. Can you try re-run your experiments,...

A question about the buffer size

Hi there, I will be back at my PC on Monday and will be able to investigate the discrepancy in the reported performance for BC on 3m then. But in...

A question about the buffer size

The reason the samples are only portions of an entire trajectory is simply a relic of how my replay buffer was implemented. It was convenient to unroll the recurrent neural...