This repository builds on code from earlier on in my learning journey. I am looking to explore more techniques in the future and will likely be creating a new repository for this purpose. That said, I will be lightly maintaining it, so feel free to contribute to it or raise issues.
Installation | Usage | Project Structure | License | Contributing | Acknowledgments | References
This repository contains an implementation of a Deep Q-Network (DQN) agent for reinforcement learning tasks, specifically designed to interact with the classic control environments with discrete action spaces of OpenAI Gym environments. The agent is implemented in PyTorch and uses a simple feedforward network as the Q-function approximator. The agent is capable of training and evaluating its performance in a variety of environments.
- Clone the repository:
git clone https://github.com/Uokoroafor/gym_projects.git
- Install the required dependencies:
cd gym_projects
pip install -r requirements.txt
The repository provides examples of using the DQN agent to solve the LunarLander and CartPole environments. You can find the examples in the examples/
folder.
This environment is a classical control environment. It is a rocket trajectory optimization problem. There are two environment versions: discrete or continuous. This implementation uses the discrete version. The aim is to land the Lander on the landing pad, which is always at coordinates (0,0).
There are four discrete actions available:
- 0: do nothing
- 1: fire left orientation engine
- 2: fire main engine
- 3: fire right orientation engine
The state is an 8-dimensional vector:
- The coordinates of the lander in x & y
- Its linear velocities in x & y
- Its angle
- Its angular velocity
- Two booleans that represent whether each leg is in contact with the ground or not.
An episode is considered a solution if it scores at least 200 points.
The lander starts at the top center of the viewport with a random initial force applied to its center of mass.
The episode finishes if:
-
The lander crashes (the lander body gets in contact with the moon);
-
The lander gets outside of the viewport (x coordinate is greater than 1);
-
The lander is not awake
To train the DQN agent on the LunarLander environment, for example, run the following command:
python examples/lunar_agents.py
The agent is able to learn the environment going from a random policy to a policy that is able to land the Lander on the landing pad.
This is part of the Classic Control environments which corresponds to a version of the cart-pole problem. A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.
There are two discrete actions available:
- 0: Push cart to the left
- 1: Push cart to the right
The state is an 4-dimensional vector:
- The position of the cart (x coordinate - between -2.4 and 2.4)
- The velocity of the cart (between -inf and inf)
- The angle of the pole (between -0.418 and 0.418 radians)
- The angular velocity of the pole (between -inf and inf)
An episode is considered a solution if it scores at least 200 points for v0 and 500 points for v1.
All observations are assigned a uniform random value between ±0.05.
The episode finishes if:
- The absolute value of the pole angle is more than 0.2095 radians
- The absolute value of the cart position is more than 2.4 (center of the cart reaches the edge of the display)
- The episode length is greater than 200 for v0 and 500 for v1
python examples/cartpole_agents.py
The agent is able to learn the environment going from a random policy (left) to a policy that is able to balance the pole on the cart (right).
The DQNs in the training examples are trained for 1000 episodes and evaluated for 10 episodes. They produce the training and evaluation plots below:
As denoted by the training curves, DQN training is unstable and often multiple runs are required to achieve good results. The evaluation curves show that the agent is able to learn the environment and achieve the desired score. A later repo will consider more sophisticated techniques.
The repository has the following structure:
├── dqn.py # DQN model implementation
├── replaybuffer.py # Replay buffer implementation
├── agent.py # DQN agent implementation
├── examples/ # Example usage of the DQN agent
├── images/ # Directory for saving rendering images and GIFs
├── utils/ # Utility functions
├── LICENSE # License file
├── requirements.txt # Requirements file
├── gym_environments.ipynb # Jupyter notebook with usage examples
└── README.md # Project README file
This project is licensed under the MIT License.
Contributions to this repository are welcome. If you find any issues or have suggestions for improvements, please open an issue or a pull request.
This project builds on a Reinforcement Learning coursework from Imperial College London and was also inspired by the Deep Q-Network algorithm and the OpenAI Gym environment. I would also like to thank Prof Aldo Faisal, Dr Paul Bilokon and the rest of the teaching team for setting up the course and providing the resources for learning.
-
Playing Atari with Deep Reinforcement Learning by Volodymyr Mnih et al. (2013):
-
Human-level control through deep reinforcement learning by Volodymyr Mnih et al. (2015):