fsoul issues

Results 11 issues of


                                            fsoul

training error

I find the code loads the pretrained weights in training. I tried to train without pretrained weight. But it seems a wrong operations. There is my result. ![image](https://user-images.githubusercontent.com/16297710/83408215-d8305900-a444-11ea-81e7-50ef1c42d909.png)

TypeError: add_code_sample_docstrings() got an unexpected keyword argument 'tokenizer_class'

Some questions about the code

why did the code require only one env when using rnn policy? https://github.com/marlbenchmark/off-policy/blob/release/offpolicy/scripts/train/train_mpe.py#L154

intrinsic reward and extrinsic reward

cublas runtime error

When I tried to train, it showed that RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:441 Can anyone help?

Result in Treechop

I retrain the ppo in treechop environment. But the result is different from paper. I only get 20 reward final. I didn't change anything. What problem would it be?

Can you share the container or images?

[Bug Report] Inconsistency between observations and infos in dataset in the Antmaze-large-diverse-v2

The observations in Antmaze is like[qpos, qvel]. But there is difference between dataset['observations'] and dataset['infos/qpos'], dataset['infos/qvel']. ![1677486330536](https://user-images.githubusercontent.com/16297710/221511696-c8ca1c4d-5eca-4990-b32b-9306a40a83dd.png)

Different results of halfcheetah

Hello, I tried the same config with the repo and got the same good performance with the paper. However, when I tried the env halfcheetah and the testing score is...

numpy.float

https://github.com/mmatl/urdfpy/blob/5466842899b33bd549e8f9e2a9a987bd5e37373b/urdfpy/urdf.py#L898 It should be np.float64...