Xiaohan Wang
Xiaohan Wang
> nice > ### I have Solved this question: > ```python > import os > > os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" > os.environ["CUDA_VISIBLE_DEVICES"]="" > ```
To be specific, this issue comes from the situation that the baseline model (rollout) is the best, but the model is getting worse. I don't know why the backpropagation did...
I obtained the same problem, the original result of this code is confusing. The rewards will be around 10.0, smaller than forward episodes.
i obtain the same issue hh