Skip to content

alexmemory/cl2-lda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cl2-lda

LDA Topic Prediction

Running (dependencies: numpy, scipy, sqlite3, Mallet)

Mallet can be obtained here. The other dependencies can be installed with pip or by installing the enthough python distribution.

  1. Edit input_example.json to that the filepaths for the October 3rd debate and presidential debate corpus are correct for your computer.
  2. Preprocess the data: python format.py debates input_example.json
  3. Use Mallet to run LDA on the preprocessed data. <path/to>/bin/mallet import-dir debates/ --output topic-input.mallet --keep-sequence --remove-stopwords <path/to>/bin/mallet train-topics --input topic-input.mallet --num-topics 19 --output-state topic-state.gz --output-doc-topics debates.doc.topics --output-topic-keys debate.keys --optimize-interval 10
  4. Edit database.py so that the variable REACTIONS_FNAME points to the file with the ReactLabs reactions on your computer.
  5. Create the reactions database. python database.py create
  6. Generate the svmlite style inputs for Mallet. python svmlitegen.py
  7. To run mallet on the output of the last step, for each file generated by svmlitegen.py (task1obama.train, etc...): <path/to>/bin/mallet import-svmlite --input task1obama.train --output train.mallet <path/to>/bin/mallet train-classifier --input train.mallet --output-classifier naivebayes.classifier --trainer DecisionTree --trainer NaiveBayes --trainer MaxEnt --training-portion 0.9 --num-trials 10 --cross-validation 10

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages