alphadev
alphadev copied to clipboard
Is it correct to call `step` again in the last leaf on the environment?
while node.expanded():
action, node = _select_child(config, node, min_max_stats)
sim_env.step(action)
history.add_action(action)
search_path.append(node)
# Inside the search tree we use the environment to obtain the next
# observation and reward given an action.
observation, reward = sim_env.step(action)
Line 1031. Is it correct to call again sim_env.step(action) after loop's end? It seems that this program do additional action from previous node on the final leaf.