Skip to content

Latest commit





Folders and files

Last commit message
Last commit date

parent directory


Reinforcement Learning

Paper Notes Author Summary
DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION (ICLR '20) HackMD Raj This paper focuses to learn long-horizon behaviors by propagating analytic value gradients through imagined trajectories using a recurrent state space model (PlaNet, haffner et al)
The Value Equivalence Principle for Model-Based Reinforcement Learning (NeurIPS '20) HackMD Raj This paper introduces and studies the concept of equivalence for Reinforcement Learning models with respect to a set of policies and value functions. It further shows that this principle can be leveraged to find models constrained by representational capacity, which are better than their maximum likelihood counterparts.
Stackelberg Actor-critic: A game theoretic perspective HackMD Sharath This paper formulates the interaction between the actor and critic ans a stackelberg games and leverages the implicit function theorem to calculate the accurate gradient updates for actor and critic.
Curriculum learning for Reinforcement Learning Domains HackMD Sharath This is a survey paper on curriculum learning methods in reinforcement learning.
Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 1999) HackMD Raj This paper provides the first policy gradient algorithm based on neural networks.
Reinforcement Learning via Fenchel Rockafellar Duality HackMD Sharath This paper reviews the basic concepts of fenchel duality, f-divergences and shows how can these set of tools can be applied tin the context of reinforcement learning to derive theoritcally as well as practically robust algorithms.
High-Dimensional Continuous Control Using Generalized Advantage Estimation HackMD Raj This paper gives an algorithm with an advantage estimator and TRPO technique to empirically guarantee monotonic policy improvement.
Off-Policy Actor-Critic (ICML '12) HackMD Sharath This paper presents the first off-policy version of the actor-critic algorithms and derives a simple and elegant algorithm which performs better than the existing algorithms on standard reinforcement-learning benchmark problems.
Combining Physical Simulators and Object-Based Networks for Control (ICRA '19) HackMD Sharath In this paper the authors proposed a hybrid dynamics model, Simulation-Augemented Interaction Networks, where they incorporated Interaction Networks into a physics engine for solving real world complex robotics control tasks.
Learning Agile and Dynamic Motor Skills for Legged Robots HackMD Sharath This paper tackles the sim2real transfer problem for legged robots.
PAC-Bounds-for-Multi-armed-Bandit (CoLT '02) HackMD Raj This paper provides a technique to guarantee PAC bounds based on the rewards distirbution of the particular problem achieving better sample complexity.
Deep Reinforcement Learning for Dialogue Generation HackMD Om This paper discusses how better dialogue generation can be achieved using RL. It provides a technique to convert converstational properties like informativity, coherence and ease of answering into reward functions.
Rainbow: Combining Improvements in Deep Reinforcement Learning HackMD Om The paper discusses add-ons to the DQN and A3C that can improve their performance, namely Double DQN, Prioritized Experience Replay, Dueling Network Architecture, Distributional Q-Learning, Noisy DQN.
The Option-Critic Architecture HackMD Om Paper discusses the hierarchical reinforcement learning method implimentation based on temporal abstractions.
Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets HackMD Om The paper suggests and provides experimental justification for methods to tackle Distribution Shift.
FeUdal Networks for Hierarchical Reinforcement Learning HackMD Om This paper describes the FeUdal Network model. Employs a manager-worker hierarchy.