Skip to content

Testing various unsupervised topic modeing algorithms on the Medium articles dataset.

License

Notifications You must be signed in to change notification settings

partha2000/Topic_modeling-on-Medium_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topic modeling on Medium articles dataset

Testing various unsupervised topic modeing algorithms on the Medium articles dataset.

Description:

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus.

Link to the kaggle dataset: Dataset

Link to the kaggle kernel: Kernel

To be tested:

  • : LDA (Latent Dirichlet Allocation)
  • : NMF (Non-negative Matrix Factorization)
  • : LSA (Latent Semantic Analysis)

How to reproduce:

The code sample present in this repository will not work on your machine with simple clonning. You will need to install all the dependencies present in the requirements.txt file. Further you need download the dataset from the above link and change the working directory inside the notebook to the one of your choice.

About

Testing various unsupervised topic modeing algorithms on the Medium articles dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published