This project allows quick and easy experiment on OpenAI's Procgen Benchmark.
Unlike OpenAI's baseline, this project is implemented using PyTorch.
(A good agent only eats fruits but not other foods.)
Currently supported methods:
- Proximal Policy Optimization (PPO)
- Deep Q Learning [probably won't work]
$ conda env create -f environment.yml
$ conda activate procgen
- essential packages are: pytorch, procgen, tensorboard, tqdm
$ cd train
$ python run.py
$ cd train
$ python run.py --mixreg --num_levels 50 --l2 0
- All plots in the report are in
train/results/logs/PPO/plots
- All testing curves are in
train/results/logs/PPO/eval_csv
- Launching Tensorboard
$ cd train
$ tensorboard --logdir results/logs
$ cd train
$ python run.py --eval_model <path-to-your-model>
Argument | Default | Description |
---|---|---|
--eval_model |
None | The path of trained model (visualizing performance) |
--stack |
1 | The number of recent frames to stack together as input |
--flare |
False | Boolean flag for whether to use FLARE |
--mixreg |
False | Boolean flag for whether to use mixreg |
Argument | Default | Description |
---|---|---|
--env_name |
'fruitbot' | The name of the environment |
--num_envs |
64 | The number of copies for the environment |
--num_levels |
50 | The number of levels for the agent to train |
--start_level |
500 | The starting level for the agent to train |
Argument | Default | Description |
---|---|---|
--train_step |
5e6 | The total number of frames for the agent to train |
--train_resume |
0 | The checkpoint for agent to resume training |
--update_freq |
256 | The number of frames for each environment to gather for training |
--eval_freq |
10 | The frequency (per training loop) to evaluate performance |
--saving_freq |
10 | The frequency (per training loop) to save model |
--num_batches |
8 | The number of batches in one epoch (not batch size) |
--num_epochs |
3 | The number of epochs in one train step |
--clip_range |
0.2 | The range to clip policy deviation and value estimate deviation |
--gamma |
0.999 | The discount factor |
--lam |
0.95 | The hyperparameter in GAE |
--ent |
0.01 | The coefficient for entropy penalty |
--cl |
0.5 | The coefficient for value estimation |
--lr_start |
5e-4 | The learning rate for Adam |
Argument | Default | Description |
---|---|---|
--conv_dims |
[16, 32, 32] | The number of filters in each Impala block |
--fc_dims |
[256] | The number of hidden units in the fully connected layer |
- Note that although these are passed in as lists, this project doesn't support customizing the number of layers (by now). So the length of these two arguments should match the length of the default ones.