fsoul
fsoul
I find the code loads the pretrained weights in training. I tried to train without pretrained weight. But it seems a wrong operations. There is my result. 
why did the code require only one env when using rnn policy? https://github.com/marlbenchmark/off-policy/blob/release/offpolicy/scripts/train/train_mpe.py#L154
When I tried to train, it showed that RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:441 Can anyone help?
I retrain the ppo in treechop environment. But the result is different from paper. I only get 20 reward final. I didn't change anything. What problem would it be?
[Bug Report] Inconsistency between observations and infos in dataset in the Antmaze-large-diverse-v2
The observations in Antmaze is like[qpos, qvel]. But there is difference between dataset['observations'] and dataset['infos/qpos'], dataset['infos/qvel']. 
Hello, I tried the same config with the repo and got the same good performance with the paper. However, when I tried the env halfcheetah and the testing score is...
https://github.com/mmatl/urdfpy/blob/5466842899b33bd549e8f9e2a9a987bd5e37373b/urdfpy/urdf.py#L898 It should be np.float64...