Skip to content

Mirror to the capstone project developed for the Columbia course: "Big Data, Machine Learning, and their Real World Applications, which I worked on.

License

Notifications You must be signed in to change notification settings

Ferryistaken/Capstone-Mirror

Repository files navigation

GitHub Workflow Status Lines of code GitHub GitHub repo file count GitHub code size in bytes GitHub repo size GitHub last commit GitHub contributors Docker Automated build Docker Pulls Docker Image Size (latest by date)

Machine Learning Stock Predictions

Rakshit Kaushik, Alessandro Ferrari, Sergio Papa Estefano, Rishi Bhargava

Data

Past stock data will be obtained using the quantmod package. Quantmod stands for quantitative financial modeling framework, and it is used to "specify, build, trade, and analyse quantitative financial trading strategies."(cran.r-project.org) Opening, high, low, closing, and the adjusted closing prices of a stock can be obtained using the getSymbols() function. It provides data for every day from January 3, 2007 to the current date.

ef221867-8547-4e7b-8a9c-8455011de2bb

Importance of data:

  • Stock Market Data(opening prices, closing prices, high/low prices) => technology: train an AI model using historical data to predict stock prices => if successful, we can deploy this application as a private option for our group to use when investing => less helpful (comparing with the next example)
  • Stock Market Data(opening prices, closing prices, high/low prices) => technology: train an AI model using historical data to predict stock prices => if successful, we can deploy this application as an open source package for individuals to use when investing => more helpful (community impact)

Benchmark

Existing projects include:

  • MCMC Simulation/MCTS
    • MCMC randomly calculates paths that stock price could follow
    • MCTS uses past data to tune the parameters used in the MCMC simulation
      • parameters: mean, standard deviation
    • Simulated data will accurately describe historical data => can be used to make predictions
  • Sentiment Analysis of Newspapers
    • Uses past stock data and newspaper articles
    • Sentiment of articles analyzed using Natural Language Toolkit package (NLTK)
    • Stock prices and sentiment used as explanatory variables for neural network, stock prediction is the response variable
  • Brownian Motion
    • Uses the knowledge that plots of simulated particle movement match plots of stock returns (gif credit: yiqiao-yin)
    • Parameters for brownian motion can be tuned using past data to predict future stock trends

Proposed Model/Algorithm:

  1. Linear Regression: y=⍺+βx+ε | x = time, y = stock price, ⍺ = y intercept, ε = error. Linear regression is used to find a linear relationship between two variables, or in our case, time and stock price. While linear regression can reveal a trend in stock data, it's not optimal for predicting stocks, as any sudden change in price can cause a user to lose money.

91cf0ec1-0a0e-4c28-9dff-ad554150d080

  1. Recurrent Neural Network using stock returns: RNNs are designed for sequence prediction problems, making them ideal for predicting stock data. The neural network will use stock returns as both the explanatory and response variables. Another option for data would be to use the closing price. A recurrent neural network could learn from past stock prices and attempt to predict the future. But stock price trends vary from year to year, so training an AI to predict next year's stock closing prices using last year's closing price data is un-ideal. Stock returns don't have as much variation and are better suited for making predictions with an RNN.

aapl-returns

  1. Recurrent Neural Network using golden crosses: A golden cross occurs when the plotted line of a stock's long term average crosses the line of its short term average. If the short term average starts below the long term average and crosses above it, the pattern is called a golden cross. Otherwise, it's called a death cross. A golden cross is a signifier of a bull market. Our model attempts to predict the stock price outcome after a golden cross. Instead of stock returns, our explanatory variable is closing price data as well as the difference between short and long term averages. The response variable is closing prices after the golden cross. When the two lines cross, the difference of their values is 0 which was represented in our data.

    Screen-Shot-2021-07-05-at-2-06-25-PM

  2. Recurrent Neural Network using closing prices: Mentioned above is how stock price data isn't great for making a accurate predictions with an RNN. To test this theory, we decided to try using closing prices for both explanatory and response variables in our RNN.

aapl-price

How to use

To run our webapp, simply install a container engine for your operating system (such as Docker or Podman), and pull our container by running:

docker pull u3ebmgske4udqutxkw8rkn/capstone-project

If this doesn't work, just clone this repository, navigate into the 'Docker' directory, and run the create-docker.sh script (only tested on Unix-like operating systems). This will create a docker image called capstone-project, which can then be used by running:

docker run --rm -p 3838:3838 localhost/capstone-project:latest

This will start the shiny server on http://127.0.0.1:3838

About

Mirror to the capstone project developed for the Columbia course: "Big Data, Machine Learning, and their Real World Applications, which I worked on.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •