Testing various unsupervised topic modeing algorithms on the Medium articles dataset.
Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus.
Link to the kaggle dataset: Dataset
Link to the kaggle kernel: Kernel
- : LDA (Latent Dirichlet Allocation)
- : NMF (Non-negative Matrix Factorization)
- : LSA (Latent Semantic Analysis)
The code sample present in this repository will not work on your machine with simple clonning. You will need to install all the dependencies present in the requirements.txt
file. Further you need download the dataset from the above link and change the working directory inside the notebook to the one of your choice.