Simulator Visualization During Training
Does it make sense for the visualization to be on while experience tuples are being generated?
We can just visualize during evaluation, since that will show how well the policy is doing during that training cycle.
Yep, I think dynamic visualization of the (1) loss, (2) reward per episode, (3) Q-values avg. would be awesome. Any ideas what would be the best way to implement that?
Dynamic plots of the loss etc would also be a great idea.
For this issue, I meant the simulator visualization, e.g. the Atari screen. I'm not sure if it makes sense to visualize during training, but maybe visualization during the evaluation might be useful?
Ah, got it. In my experience, evaluation part passes too fast, so visualization appears for a very short time - training might take a pretty long time, and seeing the algorithm play helps mentally :)