Q-learning formula

Open Unamu7simure opened this issue 3 years ago • 0 comments

Q-learning formula (18.3.10) seems to be only for non-terminal states. If St is one of terminal states (gold or traps), Q table should not be renewed and should keep the initial values (zeros). Codes in the method _learn of the class Agent could be revised: if done: q_target = r
else: q_target = r + self.gamma*np.max(q_table[next_s]) |-->q_table[s][a] += self.lr * (q_target - q_val)

Apr 22 '22 05:04 Unamu7simure