Multi-agent-RL
Multi-agent-RL copied to clipboard
Collection of RL & Multi-Agent RL projects, from basic algorithms to MADDPG. Implements value iteration, policy iteration, DQN, DDPG, and explores multi-agent cooperation in continuous/discrete action...
 TD-Linear中reward list初始化有问题,和GridEnv PSA矩阵的初始化过程中的reward list的顺序不一致:  这会导致TD-Linear中的```policy_evaluation```函数得不到正确的状态值
RL_Learning-main/scripts/Chapter5_Monte Carlo Methods/MC_Basic.py ## 当前有问题的代码: ```python sum_qvalue_list = [] for each_episode in episodes: sum_qvalue = 0 for i in range(len(each_episode)): sum_qvalue += (self.gama**i) * each_episode[i]['reward'] sum_qvalue_list.append(sum_qvalue) # ❌ 错误位置:在循环外面 self.qvalue[state][action]...