Skip to content

safwankdb/XO-Learning-Environment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XO (TicTacToe) Learning Environment

A framework for testing reinforcement learning algorithms on the simple XO game. Designed for learning and testing reinforcement learning and tree search algorithms. Based on similar environment for dots and boxes.

Frontend

The GUI live demo is at safwankdb.github.io/xo

Usage

  • Start the agents

This is the program that runs a game-playing agent. This application listens to websocket requests that communicate game information and sends back the next action it wants to play.

$ python agents/youragent.py 8080
$ python agents/minimaxagent.py 8081
$ python agents/dqnagent.py --test 8082

This starts a websocket on the given port that can receveive JSON messages. The JSON messages given below should be handled by your agent.

  • Compete different agents

$ python xocompete.py ws://127.0.0.1:8081 ws://127.0.0.1:8082 --episodes 5000
  • Start the GUI server

$ python xoserver.py 8080
  • Communicating with the game

Both players get a message that a new game has started:

{
    "type": "start",
    "player": 1,
    "game": "123456"
}

where player is the number assigned to this agent.

If you are player 1, reply with the first action you want to perform:

{
    "type": "action",
    "location": [1, 1],
}

The field location is expressed as row and column (zero-based numbering).

When an action is played, the message sent to both players is:

{
    "type": "action",
    "game": "123456",
    "player": 1,
    "nextplayer": 2,
    "location": [1, 1],
}

If it is your turn you should answer with a message that states your next move:

{
    "type": "action",
    "location": [1, 1],
}

When the game ends after an action, the message is slightly altered:

{
    "type": "end",
    "game": "123456",
    "player": 1,
    "nextplayer": 0,
    "location": [1, 1],
    "winner": 1
}

The type field becomes end and a new field winner is set to the player that has won the game.

Provided Agents

  • randomagent: Chooses a move randomly from all valid moves.
  • simpleagent: Chooses the move with lowest index among valid moves.
  • minimaxagent: Performs a full depth minimax tree search to find best moves, it can never lose.
  • alphabetaagent: Uses alpha beta pruning in tree search and finds sub-optimal moves.
  • dqnagent: Uses Deep Q Network to approximate the Q function and learns to play online.
Player 1 / Player 2 MiniMax AlphaBeta Random Simple
MiniMax 0-0-1000 1000-0-0 989-0-11 1000-0-0
AlphaBeta 0-1000-0 1000-0-0 859-88-53 1000-0-0
Random 0-815-185 208-598-194 582-310-108 545-427-28
Simple 0-1000-0 765-180-55 0-1000-0 1000-0-0

Number of games won/lost/drawn per 1000 games.

TODO

  • Host the xoserver somewhere.
  • Write xocompete.py for playing 2 agents against each other.
  • Write a random agent.
  • Write a xoserver.py and a frontend for human player.
  • Write the DQN.
  • Add convolutional architecture in DQN.
  • Write MiniMax agent.
  • Write AlphaBeta search tree agent.
  • Write a simple Q learning agent.
  • Write a SARSA agent.
  • Write Dueling DQN agent.
  • Let Agents tell their names while playing.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published