policy_gradient没有效果
https://github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/policy_gradient.py
直接复制运行 episode: 0 Evaluation Average Reward: 15.0 episode: 100 Evaluation Average Reward: 10.2 episode: 200 Evaluation Average Reward: 9.2 episode: 300 Evaluation Average Reward: 9.3 episode: 400 Evaluation Average Reward: 9.4 episode: 500 Evaluation Average Reward: 9.0 episode: 600 Evaluation Average Reward: 9.6 episode: 700 Evaluation Average Reward: 9.4 episode: 800 Evaluation Average Reward: 9.7 episode: 900 Evaluation Average Reward: 9.5 episode: 1000 Evaluation Average Reward: 9.6 episode: 1100 Evaluation Average Reward: 9.7 episode: 1200 Evaluation Average Reward: 9.1 episode: 1300 Evaluation Average Reward: 9.2 episode: 1400 Evaluation Average Reward: 9.3 episode: 1500 Evaluation Average Reward: 9.3 episode: 1600 Evaluation Average Reward: 9.4 episode: 1700 Evaluation Average Reward: 9.3 episode: 1800 Evaluation Average Reward: 9.4 episode: 1900 Evaluation Average Reward: 8.8 episode: 2000 Evaluation Average Reward: 9.3 episode: 2100 Evaluation Average Reward: 9.4 episode: 2200 Evaluation Average Reward: 9.6 episode: 2300 Evaluation Average Reward: 9.6 episode: 2400 Evaluation Average Reward: 9.3 episode: 2500 Evaluation Average Reward: 9.3 episode: 2600 Evaluation Average Reward: 9.4 episode: 2700 Evaluation Average Reward: 9.7 episode: 2800 Evaluation Average Reward: 9.6 episode: 2900 Evaluation Average Reward: 9.8