Distributed RL - Model not converging

Open NextSim opened this issue 7 years ago • 0 comments

Problem description

Using transfer learning, model hasn't converged (or really learned) after almost 24 hours.

Problem details

I am trying to run a local training job with transfer learning. After letting the model train for almost a day there is no improvement in performance at all. The PC I am using has 32 GB of RAM and a NVIDIA GeForce GTX 980 Ti. I modified distributed_agent.py to plot a few values that I can view in tensorboard. For the averages, I did a moving average with a window of 50 epochs. I quickly put the tensorboard code in, but I think it is OK. I've attached the tensorboard output, modified distributed_agent and my train.bat

Experiment/Environment details

Tutorial used: DistributedRL
Environment used: Neighborhood

Files.zip

Nov 07 '18 14:11 NextSim