Skip to content

This repository contains the code for the winning solution of the myochallenge competition

Notifications You must be signed in to change notification settings

iarai/MyoChallgenge-IARAI-JKU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MyoChallenge - IARAI-JKU

This package contains the wining solution for the MyoChallenge die reorient task. This is adapted from EvoTorch starter baseline.

DieR

LeadB

Methodology

We build upon the evotorch baseline which uses PGPE and the ClipUp optimizer on a 3 layer RNN. Our primary approach to solving this hard exploration challenge is by using potential function based reward shaping [refer to RUDDER, Arjona-Medina et al. for a detailed overview] and task subdivision using adapting curricula [Similar to POET, Wang et al.]. We have a population of environments (256), each of which starts of at "easy" difficulty and adapts its difficulty based on the success achieved by the agent in the recent past (20 episodes). The environment difficulty is controlled using the goal_rot value. Hence, the difficulty distribution is shaped by the agent's current performance.

Our recently published work(At the DRL workshop in NeurIPS 2022) shows that minimizing task-irrlevant exploration speeds up learning and improves generalization. This is because visiting task-irrelevant states forces the policy/value networks to fit irrelevant targets that affect their capacity and generalization capabilities. This was also our primary motivation for entering the challenge, to verify some of our ideas on the challenging continuous control tasks posed here.

Essentially, using a reward function based on the differences of a potential function avoids spurious optima (E.g. committing suicide) and also provides a much easier to optimize reward. (Since, it's always possible to obtain a positive reward at every state). Further, the curriculum minimizes task-irrelevant exploration speeding up learning and allowing the trained policy to generalize much better to downstream tasks in the curriculum.

Authors

Special thanks to Prof. Sepp Hochreiter & Dr. Michael Kopp for their guidance and support.

Setup

  1. Create Conda Environment
conda env create -f env.yml
  1. Run Jupyterlab
conda activate iarai-jku-myochallenge
jupyterlab
  1. Run train_die_reorient.ipynb. Further instructions and explanations can be found there.

About

This repository contains the code for the winning solution of the myochallenge competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published