layout |
---|
page |
This program would use Google NGrams (http://aws.amazon.com/datasets/8172056142375670) and another similarly structured dataset made up of insults, down-voted comments on Reddit, and possibly 4chan comments. We would harness the power of Ngrams to generate:
-
In the case of Google Books: short hopefully coherent stories
-
In the case of rude comments: even ruder, more offensive comments
If this were accomplished, TrollBot would be programmed to respond to poor online comments, probably just on YouTube, with it's own ngram-generated sentences.
-
Amazon Article demonstrating use of Hive on Google NGram dataset: http://aws.amazon.com/articles/5249664154115844
-
Hive Homepage: http://hive.apache.org
-
Hadoop Homepage (installation required for Hive): http://hadoop.apache.org
-
If you're super interested: http://www.amazon.com/Hadoop-Action-Chuck-Lam/dp/1935182196
I've read through a little bit of this and it really helps understand what all is going on. relatively short book. ~300 pages
For the next week:
We will make our way through this page http://aws.amazon.com/articles/5249664154115844