Twitter Hashtag Tracking

Motivation

Track specific hashtags or keywords in Twitter, and do real-time analysis on the tweets.

Run Example

Configuration

Set your own src/config.json file to get Twitter API access.

{ "asecret": "XXX...XXX",
  "atoken":  "XXX...XXX",
  "csecret": "XXX...XXX",
  "ckey":    "XXX...XXX"

Modify the conf/parameters.json file to set the parameters.

{ "hashtag": "#overwatch",
  "DStream": { "batch_interval": "60",
               "window_time": "60",
               "process_times": "60" }
}

Suggestion: Set batch_interval and window_time the multiple of 60.

MongoDB Database

Start a mongod process

sudo mongod

Model Training

Run Spark jobs to train a Naive Bayes model for later sentiment analysis.

$SPARK_HOME/bin/spark-submit src/model.py > log/model.log

You can check the accuracy of the trained model in log/model.log:

>>> Accuracy
0.959944108057755

Twitter Input

Wait for connection to start streaming tweets.

python3.4 src/stream.py

Spark Streaming

Run Spark jobs to do real-time analysis on the tweets.

$SPARK_HOME/bin/spark-submit src/analysis.py > log/analysis.log

Dashboard

Run the data visualization jobs.

python3.4 web/dashboard.py

Process

Twitter API

Use Twitter API tweepy to stream tweets
Filter out the tweets which contain the specific keywords/hashtag that we want to track.
Use TCP/IP socket to send the fetched tweets to the spark job

Real-time Analysis

Use Spark Streaming to perform the real-time analysis on the tweets
Count the number of related tweets for each time interval
Tweet context preprocess
- Remove all punctuations
- Set capital letters to lower case
- Remove stop words for better performance
Find out the most related keywords
Find out the most related hashtags
Sentiment analysis
- Use Spark MLlib to build a Naive Bayes model
- Classify each tweet to be positive/negative
- Training examples from Sanders Analytics

Database

Use MongoDB to store the analysis results

Visualization

The Dashboard.

Time line of related tweet counts, most related hashtags, most related keywords, the ratio of postive/negative tweets.

Prerequisite

Resources

License

See the LICENSE file for license rights and limitations (MIT).

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
conf		conf
data		data
img		img
src		src
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Hashtag Tracking

Motivation

Run Example

Configuration

MongoDB Database

Model Training

Twitter Input

Spark Streaming

Dashboard

Process

Twitter API

Real-time Analysis

Database

Visualization

Prerequisite

Resources

License

About

Releases 5

Packages

Languages

License

xuwenyihust/Twitter-Hashtag-Tracking

Folders and files

Latest commit

History

Repository files navigation

Twitter Hashtag Tracking

Motivation

Run Example

Configuration

MongoDB Database

Model Training

Twitter Input

Spark Streaming

Dashboard

Process

Twitter API

Real-time Analysis

Database

Visualization

Prerequisite

Resources

License

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages