Skip to content

Implemention of deep reinforcement learning algorithms from scratch

Notifications You must be signed in to change notification settings

Ali-Jasim/DoubleDuellingDeepQNetwork

Repository files navigation

Double Duelling Deep Q Network

LunarLanderGif

Requirements

Quick start

pip install -r requirements.txt
python env.py

Deep Q Learning

  • What is Q learning?

    • A Model-Free Reinforcement Learning algorithm to learn the Quality value of taking an Action in a particular State.

      Learn more

    • Following the Bellman update equation, we can train an agent to take high quality actions that lead to states that maximize return in reward

      Bellman Equation image

    • We construct a Quality table of states , actions, rewards, and iteratively update it with the equation above.

  • Applying Deep Learning

    • Instead of storing a table of state transitions, use neural networks to approximate the Q function.

      Why? When dealing with extremely large or continuous state spaces, storing the Quality function in a table is no longer feasible.

    • Replay Buffer

      • Represents the agents memory
      • Store transitions on every step (state, action, reward, next_state, terminated)
      • Circular insertion
      • Samples batches of transitions for neural network training
    • New Update Equation:

      DQN

    • Psuedo Code PseudoCode

      Note: Using Mean Squared Error for loss, and Stochastic Gradient Descent for back propogation

  • Modifictions

    • Double Deep Q Networks

      Purpose: Stabilize training

      • Use two Neural Networks

        • Q Network
        • Q_target Network
      • Calculate loss between them, with respect to some reward

      • Copy Q Network weights to Q_target Network every N iterations

      • Architecture Diagram DDQN

      • Updated Equation:

        DDQN

    • Double Duelling Deep Q Networks

      Purpose: Faster convergence

      • Using the same technique above, we change the Neural Network architecture to produce a Value for being in a state and reward estimates for all possible Actions (A.K.A. Advantage), then calculate Quality

      • Architecture diagram

        DuellingDQN

      • Updated Equation:

        DQN

Results

Deep Q Network

graphimg1

Double Deep Q Network

graphimg2

Double Duelling Deep Q Network

graphimg3

References:

About

Implemention of deep reinforcement learning algorithms from scratch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages