LDA spell-sound typology

Data and scripts for analysis used for Finding structure in Spelling and Pronunciation using Latent Dirichlet Allocation presented in NLP30/2024

Data

Text files

Spell

Sound

Scripts for data analysis

Scripts for analysis (Jupyter notebooks)

LDA spell/sound clusterer (Jupyter notebook)

Running was confirmed on Python 3.9, 3.10, and 3.11.

Important Parameters:

n_topics [integer]: number of topics for LDA
doc_attr [string]: any of "spell", "sound"
max_doc_size [integer]: maximum character length for docs to process
term_type [string]: any of "1gram", "2gram", "3gram", "skippy2gram", "skippy3gram"
ngram_is_inclusive [boolean]: a flag for making ngrams inclusive
max_distance_val [int, depending on max_doc_size]: scope of skippy n-grams links
term_min_freq [integer]: a filter against too infrequent terms (valued for gensim's "minfreq")
term_abuse_threshold [float: 0~1.0]: a filter against too frequent terms (valued for gensim's "abuse_theshold")

Other paramers used are not recommended to modify. Do so at your own risk.

Prerequisites

Needed Python packages

pyLDAvis [recommended to install first of all]
WordCloud
plotly
adjustText

Results

Results in .html

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
data-words		data-words
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LDA-spell-sound-explorer.ipynb		LDA-spell-sound-explorer.ipynb
LICENSE		LICENSE
README.md		README.md
gen_ngrams.py		gen_ngrams.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDA spell-sound typology

Data

Spell

Sound

Scripts for data analysis

Prerequisites

Results

About

Releases

Packages

Languages

License

kow-k/LDA-spell-sound-typology

Folders and files

Latest commit

History

Repository files navigation

LDA spell-sound typology

Data

Spell

Sound

Scripts for data analysis

Prerequisites

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages