Policy-Gradient-Reinforcement-Learning icon indicating copy to clipboard operation
Policy-Gradient-Reinforcement-Learning copied to clipboard

THis repository contains code for Policy Gradient Methods in Reinforcement Learning

Islam R., Lever G., Shawe-Taylor J., Improving Convergence of Deterministic Policy Gradient Methods in Reinforcement Learning. 2015

  1. Stochastic Policy Gradients
  2. Deterministic Policy Gradients

This repo contains code for actor-critic policy gradient methods in reinforcement learning (using least-squares temporal differnece learning with a linear function approximator) Contains code for:

The algorithms we consider include:

  1. Episodic REINFORCE (Monte-Carlo) Actor-Critic Stochastic Policy Gradient
  2. Stochastic Off-Policy Actor-Critic Policy Gradient
  3. Deterministic Policy Gradients
  4. Deterministic Gradients with Stochastic Exploration
  5. Natural Stochastic Policy Gradients
  6. Natural Deterministic Policy Gradients
  7. Deterministic Gradients with Adaptive Step Size Gradient Ascent
  8. Deterministic Gradients with Momentum-Based Nesterov's Accelerated Gradient
  9. Stochastic Gradients with Momentum-Based Nesterov's Accelerated Gradient

We consider the following MDPs using a Parameterized Controller (Agent):

  1. Toy MDP
  2. Grid World (10x10) MDP
  3. Mountain Car MDP
  4. Cart Pole MDP
  5. Pendulum MDP