In-context Reinforcement Learning with Algorithm Distillation (Unofficial)

This is an unofficial implementation of the paper "In-context Reinforcement Learning with Algorithm Distillation" (paper). Only the DarkRoom environment is implemented in this repository.

Installation

pip install -r requirements.txt

Then set up wandb by running wandb login in the terminal.

Run Experiments

First generate learning trajectories using a source RL algorithm (A2C).

python generate_lifetimes.py --env_id DarkRoom-v0

By default, generated trajectories are saved as darkroom_normal_*.pkl.

Then, train the Transformer model.

python in_context_learner.py --env_id DarkRoom-v0

By default, the trained model is saved in output.

Finally, test the Transformer model by rolling out the policy.

python in_context_learner.py --env_id DarkRoom-v0 --eval

An example of the finally generated trajectories by the Transformer model:

Note

So far I haven't reproduced the results in the paper. It appears that I still need to search the hyperparameters, and the temperature in Transformer decoding is also a key factor -- greedy decoding does not seem to work as it generates deterministic policies without exploring the domain.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
_figures		_figures
configs		configs
envs		envs
.gitignore		.gitignore
README.md		README.md
generate_lifetimes.py		generate_lifetimes.py
in_context_learner.py		in_context_learner.py
lifetime_dataset.py		lifetime_dataset.py
model.py		model.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In-context Reinforcement Learning with Algorithm Distillation (Unofficial)

Installation

Run Experiments

Note

About

Releases

Packages

Languages

shunzh/RL-Algorithm-Distillation

Folders and files

Latest commit

History

Repository files navigation

In-context Reinforcement Learning with Algorithm Distillation (Unofficial)

Installation

Run Experiments

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages