Setup shortterm replay buffer

Open philkuz opened this issue 9 years ago • 0 comments

Need to be able to do batched conditionals in tensorflow.

At the current moment we aren't calculating gamma loss with the reward function.

Add a replay_memory to the subcritic network instead of the polynomial critic network.

Mini-batch of 64 instead of 1 (online to mini-batch)

Oct 11 '16 04:10 philkuz