agents
agents copied to clipboard
Training mode and Q-policies
Q policies have no way of knowing whether the Q-Network underlying them should be called with the training mode flag = True or False. By default they are implicitly always calling the network with inference mode, but it is not cear to me why one would want this to be the default. This implies that in a standard scenario like
tf_env = ...
tf_agent = DqnAgent(...)
collect_policy = tf_agent.collect_policy
replay_observer = [replay_buffer.add_batch]
collection_step = DynamicEpisodeDriver(tf_env, collect_policy, observers=replay_observer).run(num_episodes=100)
one would end up with a replay buffer filled with transitions that were sampled on inference mode (which may be undesirable given that most likely the agent will be trained on them).