Mallet can be obtained here. The other dependencies can be installed with pip or by installing the enthough python distribution.
- Edit input_example.json to that the filepaths for the October 3rd debate and presidential debate corpus are correct for your computer.
- Preprocess the data:
python format.py debates input_example.json
- Use Mallet to run LDA on the preprocessed data.
<path/to>/bin/mallet import-dir debates/ --output topic-input.mallet --keep-sequence --remove-stopwords
<path/to>/bin/mallet train-topics --input topic-input.mallet --num-topics 19 --output-state topic-state.gz --output-doc-topics debates.doc.topics --output-topic-keys debate.keys --optimize-interval 10
- Edit database.py so that the variable REACTIONS_FNAME points to the file with the ReactLabs reactions on your computer.
- Create the reactions database.
python database.py create
- Generate the svmlite style inputs for Mallet.
python svmlitegen.py
- To run mallet on the output of the last step, for each file generated by svmlitegen.py (task1obama.train, etc...):
<path/to>/bin/mallet import-svmlite --input task1obama.train --output train.mallet
<path/to>/bin/mallet train-classifier --input train.mallet --output-classifier naivebayes.classifier --trainer DecisionTree --trainer NaiveBayes --trainer MaxEnt --training-portion 0.9 --num-trials 10 --cross-validation 10