Find similar documents

Scalable pyspark implementation of an algorithm to retrieve similar documents in a corpus.

This project was submitted as final assignment for the Algorithms for Massive Data class, MsC in Data Science and Economics, University of Milan.

The notebook was run on google colab. Commenter privileges have been granted to anyone accessing the notebook via link

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
AMD Project.ipynb		AMD Project.ipynb
README.md		README.md
Report.pdf		Report.pdf

Provide feedback