The solution uses Deep Reinforcement Learning (Q-learning) approach.
Repository contains trained network (after 1000 iterations)
- is wall/tail directly up front
- is wall/tail directly on the right side
- is wall/tail directly on the left side
- is snack ahead (no matter how far)
- is snack on the right (no matter how far)
- is snack on the left (no matter how far)
- do nothing (keep going on current direction)
- turn right
- turn left
- +1 for finding snack
- -1 for hitting wall/tail
Param | Value | Info |
---|---|---|
LEARNING_RATE | 0.001 | |
GAMMA | 0.95 | Discount rate |
EPSILON | 1.0 | Exploration rate |
EPSILON_DECAY | 0.995 | |
EPSILON_MIN | 0.01 | |
MEMORY | 2000 | Experience replay |
MINI_BATCH | 32 |