HLT

lexical_diversity_calculator.py

Uses type/token ratio to calculate lexical diversity.

Pre-processing includes tokenising input, removing stopwords and using nltk's Porter Stemmer to obtain word stems.

Run:

python3 lexical_diversity_calculator.py -n SampleTexts/EdSheeranLyrics.txt

Output

EdSheeranLyrics.txt lexical diversity: 0.2112

word_proportions.py

Finds proportions of adjectives, verbs, nouns and adjectives in a text. Categorises remaining types as 'other'.

Preprocessing involves tokenisation of input and removal of stopwords.

Uses nltk's part of speech (POS) tagger to assign parts of speech to input text tokens. Given that nltk's POS tagger was trained using the Treebank Corpus it uses the Treebank tag set. This script will map the Treebank tags to WordNet tags before giving the proportions as output.

Run:

python3 word_proportions.py -n SampleTexts/GulliversTravels.txt

Output:

Adjectives: 7.75 %
Verbs: 17.18 %
Nouns: 22.76 %
Adverbs: 5.6 %
Other: 46.7 %

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SampleTexts		SampleTexts
README.md		README.md
lexical_diversity_calculator.py		lexical_diversity_calculator.py
word_proportions.py		word_proportions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HLT

lexical_diversity_calculator.py

word_proportions.py

About

Releases

Packages

Languages

FrancisLawlor/HLT

Folders and files

Latest commit

History

Repository files navigation

HLT

lexical_diversity_calculator.py

word_proportions.py

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages