AutonomousDrivingCookbook
AutonomousDrivingCookbook copied to clipboard
Distributed RL - Model not converging
Problem description
Using transfer learning, model hasn't converged (or really learned) after almost 24 hours.
Problem details
I am trying to run a local training job with transfer learning. After letting the model train for almost a day there is no improvement in performance at all. The PC I am using has 32 GB of RAM and a NVIDIA GeForce GTX 980 Ti. I modified distributed_agent.py to plot a few values that I can view in tensorboard. For the averages, I did a moving average with a window of 50 epochs. I quickly put the tensorboard code in, but I think it is OK. I've attached the tensorboard output, modified distributed_agent and my train.bat
Experiment/Environment details
- Tutorial used: DistributedRL
- Environment used: Neighborhood