openbrain
openbrain copied to clipboard
Setup shortterm replay buffer
Need to be able to do batched conditionals in tensorflow.
At the current moment we aren't calculating gamma loss with the reward function.
Add a replay_memory to the subcritic network instead of the polynomial critic network.
Mini-batch of 64 instead of 1 (online to mini-batch)