Download the code for RAINBOW from github Inside the folder Rainbow-master, run:
python3 main.py —id your_runname —seed 123 —game game_name —T-max 4000000 #game_name example: bank_heist, road_runner, pong, freeway.
The hyperparameters for training teacher policy can be found in Appendix in the paper.
Then run code to collect the dataset: python data_collection.py
## IP-KL
python PD_adaptive_importance_KL.py —game bank_heist
## IP-CE
python PD_adaptive_importance_CrossEntropy.py —game bank_heist
## IP base policy conpression
python PD_adaptive_importance_compression.py —game bank_heist
## The detail of parameter setting for policy distillation is provided in Appendix D in our paper
python evaluation.py --game bank_heist