Project of Massively Parallel Machine Learning at UPM 2016 / 2017.
In this Project, Linear Regression and Logistic Regression are implemented based on Spark and Spark SQL.
-
Linear Regression uses Normal Equation and RMSE (Root Mean Squared Error) as a metric.
-
Logistic Regression uses Gradient Descent or Netwon's method and Confusion Matrix as a metric.
-
Implemented Naive and Efficient Cross Validation in Spark.
Check the slideshare link linear and logistic regression on spark.