DeepPath
DeepPath copied to clipboard
Retraining code is a little different from the algorithm decription.
In policy_agent.py, the retraining code, why there is a BFS teacher-guided training after the agent failed? This is not the same as the algorithm decription. Does this mean BFS is the upper bound of the RL agent?