Retraining code is a little different from the algorithm decription.

Open zdh2292390 opened this issue 4 years ago • 0 comments

In policy_agent.py, the retraining code, why there is a BFS teacher-guided training after the agent failed? This is not the same as the algorithm decription. Does this mean BFS is the upper bound of the RL agent?

Apr 14 '21 13:04 zdh2292390