Xiaohan Wang

Results 4 comments of Xiaohan Wang

> nice > ### I have Solved this question: > ```python > import os > > os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" > os.environ["CUDA_VISIBLE_DEVICES"]="" > ```

To be specific, this issue comes from the situation that the baseline model (rollout) is the best, but the model is getting worse. I don't know why the backpropagation did...

I obtained the same problem, the original result of this code is confusing. The rewards will be around 10.0, smaller than forward episodes.