We provide code for evaluate the safety and sample-efficiency of our proposed RL framework.
For safety, we use Safe Set Algorithm (SSA).
For efficiency, there are more strategies you can choose:
1, Adapting SSA;
2, Exploration (PSN, RND, None);
3, Learning from SSA;
The video result is shown below, agent is trained to drive to the goal while avoiding dynamic obstacles. The red means SSA is triggered.
Please cite our paper as:
@article{chen2021safe,
title={Safe and sample-efficient reinforcement learning for clustered dynamic environments},
author={Chen, Hongyi and Liu, Changliu},
journal={IEEE Control Systems Letters},
volume={6},
pages={1928--1933},
year={2021},
publisher={IEEE}
}
conda create -n safe-rl
conda install python=3.7.9
pip install tensorflow==2.2.1
pip install future
pip install keras
pip install matplotlib
pip install gym
pip install cvxopt
python train.py --display {none, turtle} --explore {none, psn, rnd} --no-qp --no-ssa-buffer
python train.py --display {none, turtle} --explore {none, psn, rnd} --qp --no-ssa-buffer
python train.py --display {none, turtle} --explore {none, psn, rnd} --no-qp --ssa-buffer
--display
can be eithernone
orturtle
(visulization).--explore
specifies the exploration strategy that the robot uses.--no-qp
means that we use vanilla SSA.--qp
means that we use adapted SSA.--no-ssa-buffer
means that we use the default learning.--ssa-buffer
means that we use the safe learning from SSA demonstrations.
You may also try to test other safe controller (CBF, Shield) by uncommenting line 108-109 and 155-157.
Part of the simulation environment code is coming from the course CS 7638: Artificial Intelligence for Robotics in GaTech. We get the permission from the lecturor Jay Summet to use this code for research.