The RL trader aims to maximize the profit by doing a daily rebalancing of the portfolio.
- The model trains on the first half of the stock prices and tests on the second half.
- The model predicts what action to take: buy/sell/hold based on historical data.
- An epsilon-greedy policy is used to allow for exploration.
Prediction Model:
- Linear Regression using Gradient Descent with Momentum
This is an example of how the trader acts on one of the stocks (JPM).
Green points is when the trader buys, while red points are when the trader sells.
We also compare the returns on the 5 years of test data with two other portfolios:
-
Portfolio with equally-weighted stocks: 92.73 %
-
Portfolio with Reinforcement Learning agent: 114.23 %