This project implements various types of recommender system techniques using Python3.
This project implements 3 primary types of recommender system techniques: Collaborative filtering, SVD and CUR matrix decompositions. Each one additionally also implements the global baseline technique. The systems are evaluated using RMSE, precision on top K, and Spearman ranking.
Python3 along with nltk library is required to run this program.
data_processing.py : Access the movie and ratings datasets and process them for usage in the various recommender systems.
collaborative_filtering.py : Implements the collaborative filtering recommender model. It is divided into 3 parts- precision on top K, Spearman ranking and RMSE
- Precision on top K uses Pearson correlation for checking similarity between the test user and the other users and accordingly predicts ratings. It then calculates the precision among the top ratings.
- Spearman ranking uses the Spearman ranking technique to predict the rating of a test user for some test movie.
- RMSE uses Pearson correlation to calculate the similarities and then calculates the RMSE over some test dataset, which consists of randomly selected users accounting for 20% of the entire user database. It then calculates the RMSE from all the predicted ratings and displays a final RMSE value.
SVD.py : Recommends movies to users using SVD decomposition, it has the following methods within it,
- SVD: Calculates the SVD decompostion of the matrix given to it.
- Query: This function queries the SVD matrix given a query vector
- RMSE: It calculates the root mean square error (RMSE) according to the definition given in the textbook for UV decomposition.
- Energy: Calculates the energy of a matrix as defined as the sum of the squares of the individual elements
- Precision_top_k: This calculates the Precision of the top K elements given a query and the V matrix of the SVD
- spearmanCoefficient: Uses the Spearman ranking technique to predict the rating of a test user for some test movie.
CUR.py : Recommends users to movies using CUR decomposition, it has the following methods inside it,
- CUR: It decomposes the given matrix into the three C,U,R matrices.
- rmse: It calculates the root mean square error (RMSE) according to the definition given in the textbook for UV decomposition.
- query: It takes a vector of user's previous ratings and predicts what movies to recommend to him/her.
- precisionTopK: Precision on top K uses Pearson correlation for checking similarity between the test user and the other users and accordingly predicts ratings. It then calculates the precision among the top ratings.
- Spearman ranking: Uses the Spearman ranking technique to predict the rating of a test user for some test movie.
- Energy: this is a utility function used to calculate the energy of the decomposition.
$ python data_processing.py </br>
$ python collaborative_filtering.py </br>
$ python SVD.py </br>
$ python CUR.py
System | RMSE | Precision on top K | Spearman Ranking | Time |
---|---|---|---|---|
Collaborative filtering | 0.9 | 72% | 40% | 1.823s |
Collaborative filtering (baseline) | 3.6 | 65% | 26% | 1.823s |
SVD | 0.18 | 78% | 85% | 11.49ms |
SVD (90% Energy) | 4.00 | 81% | 87% | 0.069ms |
CUR | 0.112925 | 89.9% | 100% | 58.926 ms |
CUR (90% Energy) | 0.11166 | 89.9% | 100% | 51.930 ms |
The project uses
- Python3
- Numpy
- Scipy
- OS
- Pickle
- Math
Chandrahas Aroori [https://github.com/Exorust]
Naren Surampudi [https://github.com/nsurampu]
Aditya Srikanth [https://github.com/aditya-srikanth]
We'd like to thank our Information Retrieval instructor to give us this opportunity to make such a project.