Text Analytics project using Python's NLTK library.
In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be looking at the following speeches of the Presidents of the United States of America:
- President Franklin D. Roosevelt in 1941
- President John F. Kennedy in 1961
- President Richard Nixon in 1973
Code Snippet to extract the three speeches:
import nltk
nltk.download('inaugural')
from nltk.corpus import inaugural
inaugural.fileids()
inaugural.raw('1941-Roosevelt.txt')
inaugural.raw('1961-Kennedy.txt')
inaugural.raw('1973-Nixon.txt')
2. Removing all the stopwords from the three speeches & showing the word count before and after the removal of stopwords.
3. Most frequently used words in the inaugural address for each president (after removing the stopwords)
• The word that occurs the most number of times in the 1941 inaugural address for president Roosvelt is "nation".
• While the top three words based on frequency of repitition were 'nation': 17 times, 'know': 10 times and 'peopl': 9 times.
• Here we should also note that the words 'spirit': 9 times, 'life': 9 times, 'democraci': 9 times and 'becaus': 9 times were repeated the same 9 number of times as the word 'peopl'. But only top three words were asked so we could not fit these words.
• As 'peopl' was the word which came first on the list among the words having frequency as 9, it was included in the top three words. But in real sense any of these words can replace the word 'peopl' among the top three words.
• The word that occurs the most number of times in the 1961 inaugural address for president Kennedy is "let".
• While the top three words based on frequency of repetition were 'let': 16 times, 'us': 12 times and 'power': 9 times.
• The word that occurs the most number of times in the 1973 inaugural address for president Nixon is "us".
• While the top three words based on frequency of repitition were 'us': 26 times, 'let': 22 times and 'america': 21 times.