Jonas Eschmann

Results 28 comments of Jonas Eschmann

I would say `model = model |> gpu` is not a solution because it corrupts the correspondence in stateful optimisers. For example the Adam optimiser uses an IdDict to keep...

Hi @smartengm 1. Do you mean a differentiable model for an MPC? MPCs usually don't have the cocept of an (amortized) policy 2. Like mentioned in https://github.com/rl-tools/rl-tools/issues/6#issuecomment-2152531312 it should be...

Hi, I just updated the car example (I haven't tried it for some time) and added a [Readme](https://github.com/rl-tools/rl-tools/tree/master/src/rl/environments/car). I just used `car_track.cpp` to debug the environment when I created it,...

Hi @user-1701 can you elaborate a bit on your use-case? Is it just about pausing and continuing under identical conditions? Because with changes e.g. to the model architecture, replay buffer...

Hi @DaxLynch sorry for the late reply. That sounds like a very cool project! Since TD3 is an off-policy method you can populate the replay buffer however you like. I...

PS: For the same use case (using MuJoCo as a subdirectory) it would also be great if the `cmake --install` behavior could be completely disabled through e.g. a flag. I...

@bhack I wanted to make sure the pipeline is bottlenecked by the dataset throughput and not the reduce_sum.

Hi @Victornovikov! Sorry, I didn't see your PRs on the docs repo. I merged them, thanks for letting me know, and sorry that they were wrong in the first place!

Hi @cnDengyu thank you for pointing this out! Looks like a good solution! Let me know if you want to create a pull request for this, otherwise, I would just...

Thank you for the details! In that case, we should probably just check for `std=0` in all cases. One clarification about the implementation of the distributions: The idea is to...