See debug.py to see example of how everything can be run. Summary of contents:
- environments contains the Environments (currently only a single environment, with a simple one-dimensional state space and uniformly distributed transitions, and where one end of state space is always prefered for reward maximization, so adversarial transition function can be computed analytically).
- policies contains different policies I am experimenting with
- models contains nn.Module code for modelling the Q / beta / w functions (along with the critic functions for minimax methods)
- learners contains the learning algorihtms (currently have implemented minimax algorithm for estimating Q/beta)
- utils contains some useful generic utilities
libraries needed: torch, numpy, gymnasium