cl2-lda

LDA Topic Prediction

Mallet can be obtained here. The other dependencies can be installed with pip or by installing the enthough python distribution.

Edit input_example.json to that the filepaths for the October 3rd debate and presidential debate corpus are correct for your computer.
Preprocess the data: python format.py debates input_example.json
Use Mallet to run LDA on the preprocessed data. <path/to>/bin/mallet import-dir debates/ --output topic-input.mallet --keep-sequence --remove-stopwords <path/to>/bin/mallet train-topics --input topic-input.mallet --num-topics 19 --output-state topic-state.gz --output-doc-topics debates.doc.topics --output-topic-keys debate.keys --optimize-interval 10
Edit database.py so that the variable REACTIONS_FNAME points to the file with the ReactLabs reactions on your computer.
Create the reactions database. python database.py create
Generate the svmlite style inputs for Mallet. python svmlitegen.py
To run mallet on the output of the last step, for each file generated by svmlitegen.py (task1obama.train, etc...): <path/to>/bin/mallet import-svmlite --input task1obama.train --output train.mallet <path/to>/bin/mallet train-classifier --input train.mallet --output-classifier naivebayes.classifier --trainer DecisionTree --trainer NaiveBayes --trainer MaxEnt --training-portion 0.9 --num-trials 10 --cross-validation 10