Skip to content

Latest commit

 

History

History
24 lines (20 loc) · 1.56 KB

README.md

File metadata and controls

24 lines (20 loc) · 1.56 KB

cl2-lda

LDA Topic Prediction

Running (dependencies: numpy, scipy, sqlite3, Mallet)

Mallet can be obtained here. The other dependencies can be installed with pip or by installing the enthough python distribution.

  1. Edit input_example.json to that the filepaths for the October 3rd debate and presidential debate corpus are correct for your computer.
  2. Preprocess the data: python format.py debates input_example.json
  3. Use Mallet to run LDA on the preprocessed data. <path/to>/bin/mallet import-dir debates/ --output topic-input.mallet --keep-sequence --remove-stopwords <path/to>/bin/mallet train-topics --input topic-input.mallet --num-topics 19 --output-state topic-state.gz --output-doc-topics debates.doc.topics --output-topic-keys debate.keys --optimize-interval 10
  4. Edit database.py so that the variable REACTIONS_FNAME points to the file with the ReactLabs reactions on your computer.
  5. Create the reactions database. python database.py create
  6. Generate the svmlite style inputs for Mallet. python svmlitegen.py
  7. To run mallet on the output of the last step, for each file generated by svmlitegen.py (task1obama.train, etc...): <path/to>/bin/mallet import-svmlite --input task1obama.train --output train.mallet <path/to>/bin/mallet train-classifier --input train.mallet --output-classifier naivebayes.classifier --trainer DecisionTree --trainer NaiveBayes --trainer MaxEnt --training-portion 0.9 --num-trials 10 --cross-validation 10